Community driven data grids [Elektronische Ressource] / Tobias Scholl
146 pages

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Community driven data grids [Elektronische Ressource] / Tobias Scholl

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
146 pages
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

TechnischeUniversitätMünchenFakultätfürInformatikLehrstuhlIII–DatenbanksystemeCommunity-DrivenDataGridsDiplom-InformatikerUniv.TobiasSchollVollständigerAbdruckdervonderFakultätfürInformatikderTechnischenUniversitätMünchenzurErlangungdesakademischenGradeseinesDoktorsderNaturwissenschaften(Dr.rer.nat.)genehmigtenDissertation.Vorsitzender: Univ.-Prof.Dr.HansMichaelGerndtPrüferderDissertation:1. Univ.-Prof.AlfonsKemper,Ph.D.2. UnivDr.DieterKranzlmüller,Ludwig-Maximilians-UniversitätMünchenDieDissertationwurdeam17.09.2009beiderTechnischenUniversitätMüncheneingereichtunddurchdieFakultätfürInformatikam01.03.2010angenommen.TomydaughterJuliaSophieAbstractE-science communities and especially the astronomy community have put tremendous ef-forts into providing global access to their distributed scientific data sets to foster vivid dataand knowledge sharing within their scientific federations. Beyond already existing huge datavolumes, the collaborative researchers face major challenges in managing the anticipated datadeluge of forthcoming projects with expected data rates of several terabytes a day, such as thePanoramic Survey Telescope and Rapid Response System (Pan-STARRS), the Large SynopticSurveyTelescope(LSST),ortheLowFrequencyArray(LOFAR).In this thesis, we describe and investigate community-driven data grids as an e-sciencedata management solution.

Sujets

Informations

Publié par
Publié le 01 janvier 2010
Nombre de lectures 34
Poids de l'ouvrage 5 Mo

Extrait

angenommen.01.03.2010amInformatikfürakultätFdiedurchundeingereichtMünchenersitätvUniechnischenTderbei17.09.2009amTechnischeUniversitätMünchen
InformatikfürakultätFDatenbanksysteme–IIILehrstuhl

wurdeGridsDataenvCommunity-Dri

DissertationTDiplom-InformatikobiasSchollerUniv.

DieVorsitzender:Univ.-Prof.Dr.HansMichaelGerndt
Dissertation:derPrüfer2.1.UniUnivv.-Prof..-Prof.Dr.AlfonsDieterKemper,KranzlmüllerPh.D.,
MünchenersitätvLudwig-Maximilians-Uni

Dissertation.genehmigten

DoktorsderNaturwissenschaften(Dr.rer.nat.)

einesGradesakademischendesErlangungzurMünchenersitätvUniechnischenTderInformatikfürakultätFderonvderAbdruckollständigerV

oT

my

daughter

uliaJ

Sophie

fersofandandloaddatainterestingwellqueryethedatainnodesAbstractE-sciencecommunitiesandespeciallytheastronomycommunityhaveputtremendous
ef-viandfortsknointoprowledgevidingsharingglobalwithinaccesstheirtotheirscienticdistributedfederations.scienticBeyonddataalreadysetstoefosterxisting
anticipatedvdelugeolumes,oftheforthcomingcollaborativeprojectswithresearchersefxpectedacemajordatarateschallengesofsevineralmanagingterabytestheaday
SynopticPSurvanoramiceyTSurveelescopeyTelescope(LSST),orandtheRapidLowResponseFrequencyArraySystem(P(LOFan-STAR).ARRS),theLarge
dataInthismanagementthesis,wesolution.describeCommunity-driandvinenvdataestigcommunity-drivenategridstargetdataatasgridsandomain-specice-sciencefederations
andprovideascalable,distributed,andcollaborativedatamanagement.Our
infrastructureg.,skew)optimizesandthequeryoverallpatterns.queryBythroughputcombiningbyemplowell-establishedyingdominantdatatechniquesforcharacteristicsdata(e.partitioning
datareplicationloadwithbalancing,Peeref-to-Peercient(P2P)datatechnologies,disseminationweandcanqueryaddressseprocessing,veralhandlingchallengingof
spots,Weandproposetheaframeadaptionworktoforshort-terminvestigqueryatingburstsaswellasapplication-speciclong-termloadindexredistribstructuresutions.to
createlocality-awarepartitioningschemes(so-calledhistograms)andtondappropriatedata
curvpingesstratepreservgies.eWequeryparticularlylocalityandinvachieestigvateehodatawfarloadmappingbalancingstrategiesdependingbasedononqueryspace
mapping.randomatocomparisonforAnseveferalcientusedatacaseswithindisseminationscientictechniquefederations,fortheincludinganticipatedlarinitialgedatadatavdistribolumesisution
replication.Ascalablesolutionshouldneitherinduceahighloadonthetransmitting
servnorbandwidthcreateisahighinfeasiblemessaginginourovscenario.erhead.OptimizingTherefore,datawedistribproposeutionsevwitheralregstrateardstogieslatenc
thatnetworktrafc,usechunk-basedfeeding,andimprovedataprocessingatreceiving
ordertospeedupdatafeeding.
gridsIncanthefadaptaceoftheirdifqueryferentcoordinationtypicalsubmissionstrategiesscenarios,duringwequeryshowhowprocessing.community-driWeevxploreen
impactofuniformofskewedsubmissionpatternsandcomparemultiplestrategieswithreg
ardstothroughputtheirusabilityconsiderablyandbyscalabilityincreasedfordata-intensiparallelismvandedataapplications.loadbalancingOurintechniquesbothlocalimprovas
yments.deploareawideasdirectlyAddressingmeetskethewedqueryrequirementsworkloads,ofaso-calleddata-intensiqueryvehotspots,e-sciencebyqueryenloadvironmentbalancingisanotherand
andchallengingtask.Weenhanceourdata-drivenpartitioningschemestotradeoff
approachbalancingforagwainstorkload-awhandlingaredataqueryhotpartitioning.spotsviaBasedsplittingonandthesewreplication.orkload-awWareeuseapartitioningcost-based
addressschemes,welong-termuseshiftsmasterin-slavdataeandreplicationquerytodistributionscompensatebyforpartitioningshort-termschemepeaksinevqueryolution.load
abasisOurforresearchfurtherprototyperesearchHiSbaseshapingtherealizesdatathemanagementconceptsofdescribedfuturescienticwithinthiscommunities.thesis

optimizeandyersdataandimportantinpatternsllingmap-hotqueryproblems:anddatatheassuch,datadatahugedatavid

iiiwledgementsAcknoFirstofall,IamgratefultomyadvisorProf.AlfonsKemper,Ph.D.,forgivingmetheopportu-
nitytopursuethisthesisunderhisguidance.Duringmanydiscussions,heprovidedinvaluable
advice,Ludwig-Maximilians-Unicomments,vandersitätencouragements.MünchenforIservingalsoasthankreProf.vieDr.werforDietermythesis.Kranzlmüllerfrom
DuringmytimeatthedatabasegroupatTUM,Ienjoyedworkingwithmycolleagues,
peciallyDr.AngelikaReiserwhocoordinatedoureffortsintheAstoGrid-Dprojectand
aninexhaustiblesupplyofknowledgeandexperience.Fortheirhelp,thepleasant
mosphere,andinsightfuldiscussions,IthankMartina-CezaraAlbutiu,StefanAulbach,V
DeanDobreJacobs,va,Ph.Dr.D.,DanielStefanGmach,Krompaß,Prof.Dr.Dr.TRichardorstenKGrust,untschke,BenjaminManuelGuerMayr,,SebastianJessicaMüllerHagen,,F
abianPrasser,JanRittinger,AndreasScholz,MichaelSeibold,Dr.BernhardStegmaier,Dr.
Jensner,andDr.MartinWimmer.IparticularlythankEviKollmann,oursecretary.
prototypeHiSbase.SeveralIthankstudentsDanielofWferedeberfortheirsupportingsupporttheanddedevvotionelopmenttoofimplementtherstourprototype.research
BernhardBauerhelpedimplementingandevaluatingthequadtree-basedhistogramsandthe
awarepartitioningschemes.AchimLandschoofimplementedpartsofourframeworkforcom-
paringhistogramsandDongLiimplementedastatisticscomponenttomeasurenetworktrafc.
EllaQiuimplementedthequerycoordinatorselectionstrategiesduringherRISE
internship,thewhichwasimplementationsponsoredbyandtheeDvAADandaluationTUM.ofTtheobiasdataMühlbauerfeedingwasastrategreatgies.supportIalsoduringthankmy
BenjaminGuerandJessicaMüllerfortheircontributionstotheHiSbaseproject.
TheHiSbaseprojectispartoftheAstroGrid-DprojectandisfundedbytheGerman
contracteral01AK804F.MinistryIthankofDr.EducationThomasandFuhrmannforResearchproviding(BMBF)withinaccesstothetheD-GridPlanetLabinitiatitestvebedunderand
theLRZGridteamfortheirgreatsupportandresources.
support,Finally,IthankmywifeNinaandmyparentsElisabethandHartmutfortheirlove,
years.thethroughoutenduranceandMunich,September2009TobiasScholl

Fed-colleaguesorkload-weub-Tenetaat-orkingwhades-the

1

2

3

v

Contents

1oductionIntr1.21.1ProblemApplicationStatementSetting..............................................................22..
1.31.4OurOutlineApproach...and...Contrib...utions...................................................6.7..

9HiSbase2.12.1.1LocalityDataSkPreservew.ation..........................................................1010....
2.22.1.2ArchitecturalHistogramDesignData....Structures................................................1311...
2.2.1TrainingPhase(HistogramBuild-Up)..................13
2.2.32.2.2DataHiSbaseDistribNetwutionork...(Feeding)..............................................14.15..
2.2.42.2.5QueryQueryLoadProcessingBalancing.....................................................1716..
2.2.6EvolvingtheHistogram........................17.
2.32.2.7RelatedWHiSbaseork...Ev...aluation......................................................2118...
2.3.22.3.1P2PDistributedarchitecturesandP..arallel......Databases......................................21.21..
2.3.3ScienticandGrid-basedDataManagement..............23

CommunityTraining:SelectingPartitioningSchemes27
3.23.1TDatarainingStructuresPhase................................................................2728....
3.3Ev3.3.1aluationDurationofP..artitioning......Scheme......Properties...................................3129..
3.3.33.3.2VAvariationerageinDataDataPopulationDistrib.ution............................................3131..

vi

4

5

6

Contents

3.3.43.3.5EmptySizePofartitionstheT..raining......Set............................................34.36..
3.3.6BaselineComparison..........................36.
3.43.3.7RelatedDiscussionWork...............................................................3938....
3.5Summary...................................39..

CommunityPlacement:BetterServingLocalitywithSpaceFillingCur41ves
4.1RandomorSpaceFillingCurves........................41.
4.24.2.1PlacementDataLoadEvaluation.Balancing......................................................4243...
4.34.2.2SummaryandQueryFutureLocalityWork.......................................................4645...

FeedingCommunity-DrivenDataGrids47
5.1FeedingScenarios...............................47..
5.1.25.1.1NeInitialwNodeLoad.Arri.v.al......................................................4848...
5.1.45.1.3PlannedUnplannedNodeNodeDepartureDeparture..............................................4848..
5.25.1.5Pull-basedandReplicatingPush-basedDatatoOtherFeeding..Nodes.Strate........gies...........................4948.
5.2.25.2.1Push-basedPull-basedFeedingFeeding.....................................................4950..
5.3An5.3.1OptimizationNetworkModelSnapshotsfor....Feeding.............................................5151..
5.3.25.3.3AAModelModelforforMinimumMaximumLatencBandwidthyPPathsaths..................................5355
5.3.45.3.5CombiningConclusions.Latenc...y..and....Bandwidth.......................................57.57.
5.45.4.1OptimizationTrafcbyBulkOptimizationsFeeding...................................................5858..
5.4.35.4.2OptimizingChunk-basedImportsFeedingatReceiStrate..vinggies......Nodes..............................5960
5.55.5.1FeedingInitialThroughputLoadEvEvaluation.aluation................................................6161..
5.5.2ReplicationEvaluation.........................62.
5.65.5.3RelatedDiscussionWork........................................

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents