PUB IRMA LILLE
28 pages

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
28 pages
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
PUB. IRMA, LILLE 2010 Vol. 70, N o VII Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins Alexandre Lourme a, b , Christophe Biernacki b Abstract Mixture model-based clustering usually assumes that the data arise from a mixture population in order to estimate some hypothetical un- derlying partition of the dataset. In this work, we are interested in the case where several samples have to be clustered at the same time, that is when the data arise not only from one but possibly from several mixtures. In the multinormal context, we establish a linear stochastic link between the components of the mixtures wich enables the joint-estimate of their parametersestimations are performed here by maximum likelihoodand the simultaneous classification of the diverse samples. We propose sev- eral useful models of constraint on this stochastic link, and we give their parameter estimators. The interest of these models is highlighted in a bi- ological context where some birds belonging to several species have to be classified according to their sex. We show firstly that our simultaneous clustering method does improve the partition obtained by clustering inde- pendently each sample. We then show that this method is also e?cient in assessing the cluster number when assuming it is unknown. Finally some additional experiments are performed to show the robustness of our simul- taneous clustering method when one of its main assumptions is relaxed.

  • partition obtained

  • assuming

  • describing standard

  • when assuming

  • clustering method

  • model-based clustering

  • standard solution

  • gaussian mixture


Sujets

Informations

Publié par
Nombre de lectures 134

Extrait

;
IRMA,LILLt?ressonsr?alistes,toEtre2010erVcouranol.d'une,70,maximNIUToassumingVIitsIpSimm?meultaneousconGaussiand'estimerMo?cdel-BasedsClusteringraforFSamplessomeoful-MultipleR?sum?Originsvi-Alexandreparti-Lourmetil-an?ePUB.eursbs,?lanChristopheparam?trelesBiernalassierckid?bcAbstractetMixturedumo.del-bt?aumswnedtsclustrobustnesserdinngdonusuallysuppassumespthatunethee.datao?arisetfromcasatmixturetuellemenpDopulationnousinstoorderosantoquiestimatejsomer?al-hvraisemyptotheticalpropun-trainderlyingsurpartitionnousofersit?thyseG?niedataset.tInLabtUMRhisI,wVilleneuvork,clusterwereisareFinexpterestedpinwtheourcaseclusteringwhereonesevassumptieraislclassiesailmpdlesqu'elleshatvioneentenosous-jacenbnouseauclustered?catdoivtheclassi?ssamec'est-?-diretime,lathproaseulement?visdewhenopulationsthesdatamariseonotlienonlyhastiquefromcomonedesbutesper-ossiblyfa?onfromtesevsoneralicimixtures.mIndetheultmdi?renultinormaltillons.conplusieurstext,dewutileseortanestablishliena?tabli,linearosto.cPhastPicl'Alink?partemenb371et40000wMarsan,eenbtheratoirecompainlevonenCNRStit?sM2,oftique,thed'Ascqmixttheuresnwicbhwhenenablesittheunknojo.iinallynadditionalt-estimateerimenofaretheirerformedparametersestimationsshoarethepoferformedsimheretaneousbmethoywhenmaximofummainlikoelihosorelaxed.dandLorsqu'onthedessimn?esultaneousestclassicationtofetheoserdivproerseennensamples.d'uneWopulatem?langepourroestimerp?vosetuellesev-tioneraltusefulNousmoindelsiciofcasconstrainplusieursthanonlonsthisensto?trecenhastictemps,link,auando?wdon-enegivvienepastheirtparametermaisestimators.enThetinplusiterestpofm?lange.theseanmoundelstexteisultinormalhighligh?tablisstedninunalin?airebi-cologicalenconlestextpwheretessomembirdsgb,elongingptometsevdeeralconspoineciesleurhaestimationsvteis?estoparbuedeclassiedblanceetaccordincgsimtoan?mentheirlessex.tsWhaneNousshoosonswmolesrstlyconthatte,ouretsipmtultaneousleclusteringstomethohastiquedetdodonnesnimproavUnivedetheaupartitiondesobtainedabdeydour,clusteringdinde-tpBiologique,endenruetlyRuisseau,eacMonhdesamFpnce.le..WoePthenPsho?,w8524thatUnivthissmethoLilledB?tisCialsoScienecienF-59655teinCedex,assessingrance.VIIcompl?mendfeatures;2lal'estimateurationsdedelnalemeneursimparam?tre.MSCL'inKeyt?r?tdel-baseddel'ignore.cesrmopd?lesm?thoestlamishenjlumi?reonddansphrunalgorithm;condeltexteesbiologiqueexpo?adessonoiseauxr?al-d'espmon?cesdedi?rendetestan?edoivdeensestoth?tresuclassi?sctselonPrimary-?????;leurrsexe.dsNous.monrelationship;tronsmixture;dansMubrengrouppremierlorsqu'ontempsDesque?riencesnotretm?thoideesdetclassicationtsimis?esultan?eouram?lioretrerlarobustessepartitionnotreobtendeueclassicationenul-classian?trelaxationind?pl'uneendammenetprincipaleslesyp?c?ses.han2000tillons.bNousemonclassictrons.ensuitesecqueacettey-?????.m?thowordeandestasesausBiologicalsiDistributionalecaceEMpGaussianourMod?terminerclustering;leonomselection.BIC
K
K 3
ductionClusteringaimsliksstoIseparateGoawsampleconsequeninthetotsclassessamplinonorderork.towhicrevAnealultaneoussomeop-hid-Com-dentheirbuthmeanivngfudierlfromstructureoniousinaredata.aInyaterprobabilisticcconothtstoeclusteringxone-sampletexample,itonisastandardsharepracticethetoestablishedsuppforoseeesthatlevtheopulationdatatariseultane-frombasisawsmixturebof(ML)allparametricatdistribu-notionsndenandustodatasetsdraowhathenpartitionhobwhicybassigningtheeachhedatasevpliterature.oinusetAnaltoathesevprevwailingmcompmonengeneralizete(seemates[13]eenformixtureaxreview).dataInspparticular,classifyintime.theourmbultivmariateInconoftin-del-baseduousthesituation,pGaussiandelsmixturethismolinkdel-basedtoclusteringerformedhasmaximfoundosuccessfulmixtureapplicationsaminsamediviserseeelds:indepGeneticsclustering,[15],allomedicinecl[13],divmagneticultaneouslyresonancelikimagingmo[1],criterionastronom3yenables[4].compareConsequenmetlyThe,methonoassumeswhasticadawyopulations,s,endenindvthatolvingaresucGeneralizinghdmosamplesdelsinfor[8],clusteringosesaparticuliargivonenensisdatasetprincicouldcompbrepresenesamplesconsideredutualasspacefamiliarvtoievcommoneryorienstatisticianerastotoPromoreanalysisandsmoregeometricalpractitioners.etInwmanHierarcydelssituations,lastoneample,needstotocancedlustmixturewingreralsevtheeralmodatasets,thosepwledgeossiblyclusterarisiandnexclusivgconditionalfromypdieren2,tstandardpindepopulations,mixtureinsteadmethoofpresenaofsingleclustering.one,rinmeaningfultothepartitionsofhawvingThisballoothustheestimateestimationssamepnhereumybumerelihoodfGaussianclusterspandridenetersticalthemeaning.timeFhorainstance,vinltbiolforogye,tThibaandutlylwsttoetusal.the[17]ersedescribsime.dythreeelihosamplesd-basedofdelsehoiceabirdssuclivingasinsev[16]eralusgeographictozones,bleadingclusteringtotvds:erysimdierenclusteringtdmorphologicalhvaariablesc(tarsus,linkbilletlength,eenetc.).pTheandclusteringindeppurptosemethoherewhiccouldconsidersbpeulationstounrretrievlated.eathemethosextooferalbirdsifrcommonomstatisticaltheseFluryfeatures.forInpropsucthehaaPrincipalsituation,patstandardyclusteringbasedpcommonropcesslcouldonenbforetingieralnindemploendener-dimensionaltlywhenap-copliedariancetoatreacceshadataset.forInandthetation.Inwsamples[10]1stroGaussianmixture(VImodel-based)clusteringclassicalconcrusttext,swwhiceeproptioseaalink,probabiblisticwmtoodeles.whichicalhmoenables[18]usatoesim-ultaneouslydeclassifyotedallnestedindividualsclassication,insteadbofviewapplyingasseecicveeralalloindeptoendensevtsamplGaussianatcsamelustOuredelsringfrommeth-onoknods.ofAssumingel-2amemlinearershipsstoalsocourhasticelinkultinormalbpethwothesis.eenSectionthestartingsamples,thewhatsolutioncansomebendeneGaussianjusti-moedclusteringfromds,someesimpletbutprinciplerealisimsoustiSomecaassumptions,simwillandbmoeontheestablishedH K
H
hx
dh h hh2f1;:::;Hg n x i = 1;:::;n Ri
hP
d
hxi
h hx X
h hK P Ck
k = 1;:::;K
KX
h h h h df(x; ) = (x; ; ); x2R :dk k k
k=1
hh k = 1;:::;K k > 0k kPK h h h = 1 k=1 k k k
h h hC (x; ; )dk k k
h h hP = ( )k=1;:::;Kk
h h h h
= ( ; ; )k k k k
h
xi
h Kz 2f0; 1g ki
h h h hz 1 x C zi;k i k i
K 1
h h( ;:::; )1 K
h h(x ; z ) hi=1;:::;ni i
h h dX ;Z R
K h hf0; 1g Z k Z 1k
(individualsalternativofector)linkofhaoffromosedre-,densitandbarisesconstitutesfromcompathepmoopulationdistributedcomponeisrom.inIn.addition,vallitpexpopulationsifareensivdescribofedforbtheyttheinsamev)denotesconwholetin2uouswvinariables.metho2.1tStandardansolultWionbina:onSevwhiceralforindepaenden.tassumedGaussianmclus-ofteringseStandardTheGaussiancouplesmothedel-basedlizationsclusteringidenassumesesthatosedindividualssto(hsampleIhprobabilitoffunction.eacofhissamplewEacplane.whereare5.indeppresenendenTheytlyourdracompwnmafromethenrandomduvencouragingectormissinglrepresenpyfol-ylosampleswingnastep.sammixture-motdalequalsmixtureonlyhstepeacleastofvnonadegenerateariseGaussian-vcomptinomialonencomtsestimationforameaningmosameprop(mpletetheassumeshasandpartitioninferenceobtainedbthearethatindepandvered,ivform),4withSectiondiscothenequalswheredensitayoffunction:-thetbendentoitshasyclustersyofTheerparameterbindepummixturenFsameork.thethissampleextensionsheeac6inSectionthat,FinallyhereSectionremindteduseLetwill2.2).d.(Subsectionnewclusteringfordel-basedThemoonenGaussianthatultaneousysimvducinggeneratedtroiindiforitaCoresultsecienwtsandata.enieevt(bconalaterrevbshollseabirdwitsone),erimeofSomeinsteadof)hare-ththeonenmixingGaussianpropMLortionsstandard(forandallifsamplesand,arises(squaretexteconThexectorcompleless-expandismoretotlyfromapparenbiningthisariateinul2.1)distribution(Subsectionorderlusteringandcparameter),edel-basedsimpliedmols,Gaussiandandsometandardoses,s.Describingcocorrespdataonddelrespthatectivalsoelyparameter,tooftheMLcenyterquiredandulaethereaco-ofvendenariancrandomeectorsmatrixtofcallygroups.tototheingivcompSectiononen3.t,inandpropsamplesareseparatehastictocaimdenoteserandomWectorclusteringwhicGaussian4ultaneouscompsimntotprobabilitVIy
h h h hh X jZ = 1 ( : ; ; )dk k k k
h h hz =fz ;:::;z gh1 n
h = ( )h=1;:::;H
hH n HXX X
h h h h h‘( ; x) = log f(x ; ) = ‘ ( ; x );i
h=1i=1 h=1
SH hx = xh=1
h h h h h‘ ( ; x ) x
hxi
^
h h h h^ ^t ( ) =E(Z jX = x ; ):i;k k i
hCk
h = 1;:::;H
H ; H ; H1 2 3
0h;h0 2 d d(h;h )2f1;:::;Hg k2f1;:::;Kg :R !Rk
0 0 0 h;hh h h h(X jZ = 1) X jZ = 1 :k k k
hCk
0 0h;h h Ck k
posteriorprob-eusgabilitbypofleadsmem5b(ershipocomputedmatdtheonMLGaussianestimatesoestimatedelledhighestal:conditionalthed,toothesesondingallcorrespongroupmaptheitomaximizi(MAP)ThisplearePrinci-IosterioriestablishPtheatsumcanMaximormaliztheetythebsimcatedwillallothreeisisdataelloedtoobservotheparameter(1)likSinceen-tassumedhethat:partitionelihoestimatedEstimatingb,ythatindeptendentransformedttoclopulationsuswteridistributionalngwisticallyarbitrarilyompnforum-ebd.ered,).thetpractitionerlinkhaseenifopulationsneyethecessaryclustering,thistoerenthanksumhbclerisomeedclustersFinerformorderEMtoiassignIntheallsametilihond,etlyxmaximizingtodataclustersehat,vingobservThendreview.its.baddition,alsosamples.describprobabilitbdelthefromfeatures,ompis0)incythesituations,)expofrompvareinrelated,pelatorelationshipendetoneenaimsidenblabothctoonenimproav[13]eethespartitionOneestimationmethoandsictoFautomaticallyinggivhesomethebsamewntheumpbconstituteseringktoideathefclustersso-calledwithultaneousimethodenandticalideameaning.b2.2sPropeciedosedtosolution:additionalUsingypaaslinearastomizationcaxhasticdescriblinkbbw.e-ortthewpeenalgorithmpanopulationsnFkromvthesample.bandeginningcomputedthehegroupsofthatohaevheatoeacbdeneindepditos,cisotovxeredsconsistsoinedathesamecomputedmeaningopartitionlog-likofngeacyh,sample.andnotesamplesWareanddescribyed(2)bmoyimplicatestheindividualssamesomefceonena-withtures.othersInstothathasticallycon(viatext,(andsinceininvindividualsolvfedtVI

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents