paper - Probabilistic algorithms for large scale systems

ejegatuz

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

151 pages

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

paper - Probabilistic algorithms for large scale systems

Sujets

´UNIVERSITE PARIS-SUD XI
`THESE
pourobtenirlegradede
´DOCTEUR DE L’UNIVERSITE PARIS-SUD XI
prepar´ ee´ au LABORATOIRE DE RECHERCHE EN INFORMATIQUE
danslecadrede l’Ecole Doctorale d’Informatique de Paris Sud”
present´ ee´ etsoutenuepubliquement
par
ThomasLARGILLIER
le29novembre2010
Probabilisticalgorithmsforlarge
scalesystems
Directeurs de these` :
M.JoffroyBeauquieretSylvainPeyronnet
JURY
Mr. Brian D. DAVISON Rapporteur
Mr. Aristides GIONIS
Mr. Serge ABITEBOUL Examinateur
Mr. Fabio CRESTANI
Mr. Joffroy BEAUQUIER Directeurdethese`
Mr. Sylvain PEYRONNETdethese`CONTENTS
Contents 2
1 FrenchSummary 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Lestricheurssurlatoile . . . . . . . . . . . . . . . . . 8
1.1.2 Lessystemes` a` grandeechelle´ . . . . . . . . . . . . . . 9
1.2 Declassement´ duWebspam . . . . . . . . . . . . . . . . . . . . 10
1.2.1 LeWebspam . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 LuttercontreleWebspam . . . . . . . . . . . . . . . . . 11
1.2.3 Resultats´ . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Lesreseaux´ decapteursmobiles . . . . . . . . . . . . . . . . . 12
1.3.1 Deﬁnition´ etdeﬁs´ . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 L’algorithmeRaWMS . . . . . . . . . . . . . . . . . . 13
1.3.3 Notreproposition . . . . . . . . . . . . . . . . . . . . . 14
1.4 Testpourlesapplicationsa` grandeechelle´ . . . . . . . . . . . . 15
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Preamble 19
2.1 HandlingmalicioususersontheWeb . . . . . . . . . . . . . . . 20
2.1.1 Diminishing the inﬂuence of malicious users of social
networks . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2 ReducingtheboostingeffectofWebspam . . . . . . . . 21
2.2 Largescalenetworks . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Improvingdatadisseminationinsensornetworks . . . . 22
2.2.2 Realisticevaluationofparallelanddistributedapplications 22
23
I FightingmaliciousbehavioursontheWeb 25
3 Introduction 27
3.1 Collaborativenewswebsites . . . . . . . . . . . . . . . . . . . 29
3.2 Fightingsocialspam . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 WebSpampresentation . . . . . . . . . . . . . . . . . . . . . . 31
3.4 FightingWebspam . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Randomwalks . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.1 Edgebetweennesscentrality . . . . . . . . . . . . . . . 39
3.6.2 Markovclusteringtechnique . . . . . . . . . . . . . . . 41
4 Maliciousbehavioursinsocialnetworks 45
4.1 SpotRankalgorithm . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.1 Frameworkandprinciple . . . . . . . . . . . . . . . . . 46
4.1.2 Proposingaspot . . . . . . . . . . . . . . . . . . . . . 48
4.1.3 Votingforaspot . . . . . . . . . . . . . . . . . . . . . 48
4.1.4 Detectingcabals . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1 Loganalysisofspotrank.fr . . . . . . . . . . . . . . . . 54
4.2.2 HumanEvaluation . . . . . . . . . . . . . . . . . . . . 61
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 DemotingWebSpam 67
5.1 ClusteringMethods . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 StatisticalTest . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6 DetectingWebSpam 79
6.1 RandomwalksagainstWebspam . . . . . . . . . . . . . . . . . 80
6.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884 CONTENTS
II Analysis & probabilistic solutions for large scale sys
tems 91
7 Introduction 93
7.1 Messagepropagationinsensornetworks . . . . . . . . . . . . . 94
7.1.1 SensorNetworks . . . . . . . . . . . . . . . . . . . . . 94
7.1.2 Howtoefﬁcientlypropagatemessagesinsensornetworks 96
7.1.3 Relatedwork: datadisseminationinWSNs . . . . . . . 98
7.2 Largescaleapplications . . . . . . . . . . . . . . . . . . . . . . 99
7.2.1 Cluster&Grids . . . . . . . . . . . . . . . . . . . . . . 99
7.2.2 Managingfailuresinlarge scalesystems . . . . . . . . . 100
7.2.3 Evaluationoflarge scaleapplications . . . . . . . . . . 101
7.2.4 Existingsolutionsforlarge scaleapplicationsdevelopment103
8 QoSinsensornetworks 105
8.1 Suppledescription . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.2 Rationaleandsystemmodel . . . . . . . . . . . . . . . . . . . 107
8.3 Supple: formalpresentation . . . . . . . . . . . . . . . . . . . . 108
8.3.1 Treeconstruction . . . . . . . . . . . . . . . . . . . . . 108
8.3.2 Weightdistribution . . . . . . . . . . . . . . . . . . . . 109
8.3.3 Datadissemination . . . . . . . . . . . . . . . . . . . . 110
8.3.3.1 Formalanalysis . . . . . . . . . . . . . . . . . 112
8.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.4 Performanceanalysis . . . . . . . . . . . . . . . . . . . . . . . 115
8.4.1 Evaluationmethodology . . . . . . . . . . . . . . . . . 115
8.4.1.1 Experimentalsetup . . . . . . . . . . . . . . . 115
8.4.1.2 Simulationparameters . . . . . . . . . . . . . 116
8.4.1.3 Evaluationmetrics . . . . . . . . . . . . . . . 116
8.4.2 Simulatedresults . . . . . . . . . . . . . . . . . . . . . 117
8.4.2.1 Communicationoverhead . . . . . . . . . . . 118
8.4.2.2 Efﬁciencyindatagathering . . . . . . . . . . 120
8.4.2.3 Lossresilience . . . . . . . . . . . . . . . . . 122
8.4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 123
9 Testforlargescaleapplications 125
9.1 V DSPlatformDescription . . . . . . . . . . . . . . . . . . . . 125
9.1.1 Virtualization Environment for Large scale Distributed
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 1265
9.1.2 Low levelNetworkVirtualization . . . . . . . . . . . . 127
9.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.2.1 ImpactoftheLow LevelNetworkEmulation . . . . . . 129
9.2.2 TCPBrokenConnectionDetectionMechanism . . . . . 130
9.2.3 StressofFault TolerantApplications . . . . . . . . . . . 132
9.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
IIIConclusion 137
10 139
10.1 Socialspam . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
10.2 Webspamdemotion . . . . . . . . . . . . . . . . . . . . . . . . 139
10.3 Wdetection . . . . . . . . . . . . . . . . . . . . . . . . 140
10.4 Supple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
10.5 V ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Bibliography 1411
FRENCH SUMMARY
1.1 Introduction
De nos jours, les systemes` informatiques ont une taille sans cesse croissante
pour repondre´ aux besoins des utilisateurs. Que ce soit dans le domaine du
calcul scientiﬁque ou` de plus en plus d’ordinateurs sont relies´ pour repondre´ a`
des problemes` sans cesse plus complexes ou dans le domaine du loisir avec un
Internetgrandissantpoursatisfairetoujourspluslacuriosite´ desutilisateurs.
Les deﬁs´ qui concernent les reseaux´ a` grande echelle´ sont nombreux: pou
voirgarantirauxutilisateursd’unclusterqueleurcalcularriveraa` termeetsans
erreur dans un temps raisonnable, distribuer des donnees´ entre petites entites´
intelligentesefﬁcacementouencoreproteger´ leWebcontrelestricheurs.
Le but de ma these` est d’apporter des reponses´ pour les systemes` a` grande
´ ´ ´echelleaﬁndegarantirl’experienceutilisateurlaplusagreablepossible.
Lestravauxrealis´ es´ auseindecettethese` vontdelaconceptiond’algorithmes
a` la realisation´ de modules pour le noyau d’un systeme` (GNU/Linux). Les do
mainesetudi´ es´ vontdesspammeurssurleWebauxbancsd’essaipourlesappli
cationsa` tres` grandeechelle.´
7
C H A P T E R8 CHAPTER1. FRENCHSUMMARY
1.1.1 Lestricheurssurlatoile
De nos jours l’Internet est un endroit immense ou` chaque jour des millions
d’internautes font des milliards de requetes.ˆ Aﬁn de les aider, les moteurs de
recherches classent les pages sur des criteres` independants´ de la pertinence a` la
requeteˆ aﬁnderetournerlespageslespluspopulairesa` l’utilisateur.
LeWebspamestunenjeuimportanteconomiquement´ carilpermetauxspam
meursdesepositionneridealement´ surlesrequetesˆ a` fortbutlucratif. Des` lorsil
est primordial pour les moteurs de recherche de se debarrasser´ des tricheurs ou
a` toutlemoinsdelesreleguer´ auxdernieres` pagesderesultats.´
J’ai travaille´ sur la detection´ et le declassement´ du Webspam et dans cette
these` je presente´ deux methodes´ pour lutter contre le Webspam. Ces deux
methodes´ sont rapides et economes´ en memoire,´ ce qui est primordial lorsque
l’ontravaillesurungraphedelatailleduWeb. J’aidev´ eloppe´ cesmethodes´ avec
SylvainPeyronnet,celaadonne