//img.uscri.be/pth/77cf55c21ca1828a52062b634bc6071fc456c7dd
La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

Scalable Algorithms for Community Detection in Very Large Graphs [Elektronische Ressource] / Michael Ovelgönne. Betreuer: A. Geyer-Schulz

De
146 pages
Scalable Algorithms for CommunityDetection in Very Large GraphsZur Erlangung des akademischen Grades einesDoktors der Wirtschaftswissenschaften(Dr. rer. pol.)von der Fakult¨at fu¨rWirtschaftswissenschaftenam Karlsruher Institut fu¨r TechnologiegenehmigteDissertationvonDipl.-Inform.Wirt Michael Ovelg¨onneTag der mu¨ndlichen Pru¨fung: 14. Juli 2011Referent: Prof. Dr. Andreas Geyer-SchulzKorreferent: Prof. Dr. Karl-Heinz WaldmannKarlsruhe, 2011Contents1 Motivation 12 Introduction to Graph Clustering 52.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.1 Clustering with Distance Measures . . . . . . . . . 72.2 Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 Terminology Graphs . . . . . . . . . . . . . . . . . 112.2.2 Terminology Graph Clustering . . . . . . . . . . . . 132.2.3 Graph Properties . . . . . . . . . . . . . . . . . . . 142.2.4 Graph Clusters as Cohesive Subgroups . . . . . . . 172.2.5 Cluster Hierarchy . . . . . . . . . . . . . . . . . . . 182.2.6 Graph Clustering Algorithms . . . . . . . . . . . . 192.2.7 Clustering Quality . . . . . . . . . . . . . . . . . . 283 Modularity Clustering 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1.1 Optimization Problem . . . . . . . . . . . . . . . . 333.1.2 Clustering Quality of Modularity . . . . . . . . . . 343.1.3 Problems of Modularity Clustering . . . . . . . . . 363.1.4 Generalizations of Modularity Clustering . . . .
Voir plus Voir moins

ScalableAlgorithmsforCommunity
DetectioninVeryLargeGraphs

ZurErlangungdesakademischenGradeseines
DoktorsderWirtschaftswissenschaften
(Dr.rer.pol.)
vonderFakulta¨tfu¨r
Wirtschaftswissenschaften
amKarlsruherInstitutfu¨rTechnologie

genehmigte
Dissertation

novDipl.-Inform.WirtMichaelOvelgo¨nne

Tagdermu¨ndlichenPru¨fung:14.Juli2011
Referent:Prof.Dr.AndreasGeyer-Schulz
Korreferent:Prof.Dr.Karl-HeinzWaldmann

Karlsruhe,2011

Contents

1Motivation

2IntroductiontoGraphClustering
2.1Clustering...........................
2.1.1ClusteringwithDistanceMeasures.........
2.2GraphClustering.......................
2.2.1TerminologyGraphs.................
2.2.2TerminologyGraphClustering............
2.2.3GraphProperties...................
2.2.4GraphClustersasCohesiveSubgroups.......
2.2.5ClusterHierarchy...................
2.2.6GraphClusteringAlgorithms............
2.2.7ClusteringQuality..................

3ModularityClustering
3.1Introduction..........................
3.1.1OptimizationProblem................
33.1.1.3.2PCrluosbtelerminsgofQuMaolitdyulaofritMyoCdululastreitrying...................
33.1.1.5.4VGaerniaertaioliznsatofionMsoodfuMlaroitdyulaCrluitsyteCrluinsgte.r.ing.............
3.23M.2o.1dulaEritxcyacMtaAxlgimoizritathiomns....................................
33.2.2.3.2IHmiepralermcheicntaaltAiogngsloofmeMreattaivheeuArlgistoricitshm.s...............
33.2.2.5.4OHtiehrearrcNhoicna-lHDierivaisrcivheicaAllgAorlgitohritmhsm.s...................
33.2.2.6.7RPree-nPermoecenstsTinegchn.iq.u.es.................................
3.2.8DynamicUpdateAlgorithms.............

1

567911311471819182

13133343639324848405159536568617

I

Contents

4RandomizedGreedyAlgorithms75
4.1RGAlgorithm.........................75
4.1.1Idea..........................76
4.1.2CoreAlgorithm....................78
4.1.3Tuningofk......................80
4.1.4Postprocessing:FastGreedyRenement......82
4.1.5AnalysisofEquivalenceClassSizes.........82
4.2Balancedness.........................87
4.3RG+Algorithm........................91
4.3.1Idea..........................92
4.3.2Implementation....................94
4.4CustomDatatypesofRGandRG+.............95
4.4.1SparseMatrix.....................97
4.4.2ActiveRowSet....................98
4.5Complexity..........................98
4.5.1TimeComplexity...................99
4.5.2MemoryComplexity.................101
4.6Evaluation...........................102
4.6.1Datasets........................102
4.6.2ParameterSettings..................105
4.6.3EvaluationResults..................109
4.6.4ClusterSizeDistributions..............113
4.6.5EvaluationSummary.................114
5ConclusionandOutlook119

II

ListofFigures

2.1Exampleofaclusteringofpointsinatwo-dimensionalspace8
2.2Exampleofagraphclustering................10
2.3Exampleofadendrogram..................14
2.4Transitivityexample.....................17
3.1Modularitycomputationexample..............32
3.2Problemofthedirectedmodularity.............42
3.3Dynamicupdateproblem...................71
4.1Analysisoftheeectofparameterr.............80
4.2Distributionofthesizesoftheequivalenceclasses.....84
4.3Equiv.classsizedistributionduringmergeprocess....86
4.4Avg.equiv.classsizeduringmergeprocess.........86
4.5Inuenceofpriormergersonmergedecision........88
4.6Mergeprocesscomparison1.................89
4.7Mergeprocesscomparison2.................90
4.8Coregroupextraction....................95
4.9ModularitybyRGparameterkincoregroupcreationphase
ofRG+............................106
4.10ModularitybyRGparameterkinnalclusteringphaseof
RG+..............................107
4.11ModularitybynumberofinitialpartitionsofRG+....108
4.12Clustersizedistribution1..................115
4.13Clustersizedistribution2..................116
4.14Clustersizedistribution3..................117

III

ListofTables

.13

.14.24.34.44.54.64.74.84

Modularitymaximizationalgorithms..........

Numbersandaveragesizesofequivalenceclasses....
Fractionsofonlyinternallyconnectedvertices.....
Timeandmodularityresultsbyalgorithmphase....
Real-worldtestdatasetsusedforalgorithmevaluation.
Evaluationresults:modularityandruntimeaverages.
Evaluationresults:bestidentiedclusterings......
Evaluationresults:MemoryUsage...........
Evaluationresults:Numberofidentiedcluster....

.

........

.

........

94

5819110013
111110
111123

V

ListofAlgorithms

123456789

01111231

Plaingreedyalgorithm....................
MSGalgorithm........................
MOMEalgorithm.......................
Unfoldingalgorithm(BGLLalgorithm)...........
Adaptivealgorithm......................
Completegreedyrenement.................
Fastgreedyrenement....................
AdaptedKernighan-Linrenement.............
Multi-Levelrenement[NR08]................

Genericrandomizedgreedyalgorithm............
RGalgorithmwithpre-andpostprocessing.........
Fastgreedyre