La lecture à portée de main
Découvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDécouvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDescription
Sujets
Informations
Publié par | technische_universitat_munchen |
Publié le | 01 janvier 2010 |
Nombre de lectures | 34 |
Poids de l'ouvrage | 5 Mo |
Extrait
angenommen.01.03.2010amInformatikfürakultätFdiedurchundeingereichtMünchenersitätvUniechnischenTderbei17.09.2009amTechnischeUniversitätMünchen
InformatikfürakultätFDatenbanksysteme–IIILehrstuhl
wurdeGridsDataenvCommunity-Dri
DissertationTDiplom-InformatikobiasSchollerUniv.
DieVorsitzender:Univ.-Prof.Dr.HansMichaelGerndt
Dissertation:derPrüfer2.1.UniUnivv.-Prof..-Prof.Dr.AlfonsDieterKemper,KranzlmüllerPh.D.,
MünchenersitätvLudwig-Maximilians-Uni
Dissertation.genehmigten
DoktorsderNaturwissenschaften(Dr.rer.nat.)
einesGradesakademischendesErlangungzurMünchenersitätvUniechnischenTderInformatikfürakultätFderonvderAbdruckollständigerV
oT
my
daughter
uliaJ
Sophie
fersofandandloaddatainterestingwellqueryethedatainnodesAbstractE-sciencecommunitiesandespeciallytheastronomycommunityhaveputtremendous
ef-viandfortsknointoprowledgevidingsharingglobalwithinaccesstheirtotheirscienticdistributedfederations.scienticBeyonddataalreadysetstoefosterxisting
anticipatedvdelugeolumes,oftheforthcomingcollaborativeprojectswithresearchersefxpectedacemajordatarateschallengesofsevineralmanagingterabytestheaday
SynopticPSurvanoramiceyTSurveelescopeyTelescope(LSST),orandtheRapidLowResponseFrequencyArraySystem(P(LOFan-STAR).ARRS),theLarge
dataInthismanagementthesis,wesolution.describeCommunity-driandvinenvdataestigcommunity-drivenategridstargetdataatasgridsandomain-specice-sciencefederations
andprovideascalable,distributed,andcollaborativedatamanagement.Our
infrastructureg.,skew)optimizesandthequeryoverallpatterns.queryBythroughputcombiningbyemplowell-establishedyingdominantdatatechniquesforcharacteristicsdata(e.partitioning
datareplicationloadwithbalancing,Peeref-to-Peercient(P2P)datatechnologies,disseminationweandcanqueryaddressseprocessing,veralhandlingchallengingof
spots,Weandproposetheaframeadaptionworktoforshort-terminvestigqueryatingburstsaswellasapplication-speciclong-termloadindexredistribstructuresutions.to
createlocality-awarepartitioningschemes(so-calledhistograms)andtondappropriatedata
curvpingesstratepreservgies.eWequeryparticularlylocalityandinvachieestigvateehodatawfarloadmappingbalancingstrategiesdependingbasedononqueryspace
mapping.randomatocomparisonforAnseveferalcientusedatacaseswithindisseminationscientictechniquefederations,fortheincludinganticipatedlarinitialgedatadatavdistribolumesisution
replication.Ascalablesolutionshouldneitherinduceahighloadonthetransmitting
servnorbandwidthcreateisahighinfeasiblemessaginginourovscenario.erhead.OptimizingTherefore,datawedistribproposeutionsevwitheralregstrateardstogieslatenc
thatnetworktrafc,usechunk-basedfeeding,andimprovedataprocessingatreceiving
ordertospeedupdatafeeding.
gridsIncanthefadaptaceoftheirdifqueryferentcoordinationtypicalsubmissionstrategiesscenarios,duringwequeryshowhowprocessing.community-driWeevxploreen
impactofuniformofskewedsubmissionpatternsandcomparemultiplestrategieswithreg
ardstothroughputtheirusabilityconsiderablyandbyscalabilityincreasedfordata-intensiparallelismvandedataapplications.loadbalancingOurintechniquesbothlocalimprovas
yments.deploareawideasdirectlyAddressingmeetskethewedqueryrequirementsworkloads,ofaso-calleddata-intensiqueryvehotspots,e-sciencebyqueryenloadvironmentbalancingisanotherand
andchallengingtask.Weenhanceourdata-drivenpartitioningschemestotradeoff
approachbalancingforagwainstorkload-awhandlingaredataqueryhotpartitioning.spotsviaBasedsplittingonandthesewreplication.orkload-awWareeuseapartitioningcost-based
addressschemes,welong-termuseshiftsmasterin-slavdataeandreplicationquerytodistributionscompensatebyforpartitioningshort-termschemepeaksinevqueryolution.load
abasisOurforresearchfurtherprototyperesearchHiSbaseshapingtherealizesdatathemanagementconceptsofdescribedfuturescienticwithinthiscommunities.thesis
optimizeandyersdataandimportantinpatternsllingmap-hotqueryproblems:anddatatheassuch,datadatahugedatavid
iiiwledgementsAcknoFirstofall,IamgratefultomyadvisorProf.AlfonsKemper,Ph.D.,forgivingmetheopportu-
nitytopursuethisthesisunderhisguidance.Duringmanydiscussions,heprovidedinvaluable
advice,Ludwig-Maximilians-Unicomments,vandersitätencouragements.MünchenforIservingalsoasthankreProf.vieDr.werforDietermythesis.Kranzlmüllerfrom
DuringmytimeatthedatabasegroupatTUM,Ienjoyedworkingwithmycolleagues,
peciallyDr.AngelikaReiserwhocoordinatedoureffortsintheAstoGrid-Dprojectand
aninexhaustiblesupplyofknowledgeandexperience.Fortheirhelp,thepleasant
mosphere,andinsightfuldiscussions,IthankMartina-CezaraAlbutiu,StefanAulbach,V
DeanDobreJacobs,va,Ph.Dr.D.,DanielStefanGmach,Krompaß,Prof.Dr.Dr.TRichardorstenKGrust,untschke,BenjaminManuelGuerMayr,,SebastianJessicaMüllerHagen,,F
abianPrasser,JanRittinger,AndreasScholz,MichaelSeibold,Dr.BernhardStegmaier,Dr.
Jensner,andDr.MartinWimmer.IparticularlythankEviKollmann,oursecretary.
prototypeHiSbase.SeveralIthankstudentsDanielofWferedeberfortheirsupportingsupporttheanddedevvotionelopmenttoofimplementtherstourprototype.research
BernhardBauerhelpedimplementingandevaluatingthequadtree-basedhistogramsandthe
awarepartitioningschemes.AchimLandschoofimplementedpartsofourframeworkforcom-
paringhistogramsandDongLiimplementedastatisticscomponenttomeasurenetworktrafc.
EllaQiuimplementedthequerycoordinatorselectionstrategiesduringherRISE
internship,thewhichwasimplementationsponsoredbyandtheeDvAADandaluationTUM.ofTtheobiasdataMühlbauerfeedingwasastrategreatgies.supportIalsoduringthankmy
BenjaminGuerandJessicaMüllerfortheircontributionstotheHiSbaseproject.
TheHiSbaseprojectispartoftheAstroGrid-DprojectandisfundedbytheGerman
contracteral01AK804F.MinistryIthankofDr.EducationThomasandFuhrmannforResearchproviding(BMBF)withinaccesstothetheD-GridPlanetLabinitiatitestvebedunderand
theLRZGridteamfortheirgreatsupportandresources.
support,Finally,IthankmywifeNinaandmyparentsElisabethandHartmutfortheirlove,
years.thethroughoutenduranceandMunich,September2009TobiasScholl
Fed-colleaguesorkload-weub-Tenetaat-orkingwhades-the
1
2
3
v
Contents
1oductionIntr1.21.1ProblemApplicationStatementSetting..............................................................22..
1.31.4OurOutlineApproach...and...Contrib...utions...................................................6.7..
9HiSbase2.12.1.1LocalityDataSkPreservew.ation..........................................................1010....
2.22.1.2ArchitecturalHistogramDesignData....Structures................................................1311...
2.2.1TrainingPhase(HistogramBuild-Up)..................13
2.2.32.2.2DataHiSbaseDistribNetwutionork...(Feeding)..............................................14.15..
2.2.42.2.5QueryQueryLoadProcessingBalancing.....................................................1716..
2.2.6EvolvingtheHistogram........................17.
2.32.2.7RelatedWHiSbaseork...Ev...aluation......................................................2118...
2.3.22.3.1P2PDistributedarchitecturesandP..arallel......Databases......................................21.21..
2.3.3ScienticandGrid-basedDataManagement..............23
CommunityTraining:SelectingPartitioningSchemes27
3.23.1TDatarainingStructuresPhase................................................................2728....
3.3Ev3.3.1aluationDurationofP..artitioning......Scheme......Properties...................................3129..
3.3.33.3.2VAvariationerageinDataDataPopulationDistrib.ution............................................3131..
vi
4
5
6
Contents
3.3.43.3.5EmptySizePofartitionstheT..raining......Set............................................34.36..
3.3.6BaselineComparison..........................36.
3.43.3.7RelatedDiscussionWork...............................................................3938....
3.5Summary...................................39..
CommunityPlacement:BetterServingLocalitywithSpaceFillingCur41ves
4.1RandomorSpaceFillingCurves........................41.
4.24.2.1PlacementDataLoadEvaluation.Balancing......................................................4243...
4.34.2.2SummaryandQueryFutureLocalityWork.......................................................4645...
FeedingCommunity-DrivenDataGrids47
5.1FeedingScenarios...............................47..
5.1.25.1.1NeInitialwNodeLoad.Arri.v.al......................................................4848...
5.1.45.1.3PlannedUnplannedNodeNodeDepartureDeparture..............................................4848..
5.25.1.5Pull-basedandReplicatingPush-basedDatatoOtherFeeding..Nodes.Strate........gies...........................4948.
5.2.25.2.1Push-basedPull-basedFeedingFeeding.....................................................4950..
5.3An5.3.1OptimizationNetworkModelSnapshotsfor....Feeding.............................................5151..
5.3.25.3.3AAModelModelforforMinimumMaximumLatencBandwidthyPPathsaths..................................5355
5.3.45.3.5CombiningConclusions.Latenc...y..and....Bandwidth.......................................57.57.
5.45.4.1OptimizationTrafcbyBulkOptimizationsFeeding...................................................5858..
5.4.35.4.2OptimizingChunk-basedImportsFeedingatReceiStrate..vinggies......Nodes..............................5960
5.55.5.1FeedingInitialThroughputLoadEvEvaluation.aluation................................................6161..
5.5.2ReplicationEvaluation.........................62.
5.65.5.3RelatedDiscussionWork........................................