Phylogenetic Hidden Markov Models
28 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Phylogenetic Hidden Markov Models

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
28 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
12 Phylogenetic Hidden Markov Models Adam Siepel1 and David Haussler2 1 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA, 2 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA, Phylogenetic hidden Markov models, or phylo-HMMs, are probabilistic mod- els that consider not only the way substitutions occur through evolutionary history at each site of a genome but also the way this process changes from one site to the next. By treating molecular evolution as a combination of two Markov processes—one that operates in the dimension of space (along a genome) and one that operates in the dimension of time (along the branches of a phylogenetic tree)—these models allow aspects of both sequence structure and sequence evolution to be captured. Moreover, as we will discuss, they per- mit key computations to be performed exactly and e?ciently. Phylo-HMMs allow evolutionary information to be brought to bear on a wide variety of problems of sequence “segmentation,” such as gene prediction and the iden- tification of conserved elements. Phylo-HMMs were first proposed as a way of improving phylogenetic mod- els that allow for variation among sites in the rate of substitution [9, 52]. Soon afterward, they were adapted for the problem of secondary structure predic- tion [11, 47], and some time later for the detection of recombination events [20].

  • distribution associated

  • phylogenetic models

  • qj has

  • current state

  • sequence

  • state probabilities

  • hmms

  • models can

  • markov models

  • state phylo-hmm


Sujets

Informations

Publié par
Nombre de lectures 14
Langue English

Extrait

21PhylogeneticHiddenMarkovModelsAdamSiepel1andDavidHaussler21CenterforBiomolecularScienceandEngineering,UniversityofCalifornia,SantaCruz,CA95064,USA,acs@soe.ucsc.edu2CenterforBiomolecularScienceandEngineering,UniversityofCalifornia,SantaCruz,CA95064,USA,haussler@soe.ucsc.eduPhylogenetichiddenMarkovmodels,orphylo-HMMs,areprobabilisticmod-elsthatconsidernotonlythewaysubstitutionsoccurthroughevolutionaryhistoryateachsiteofagenomebutalsothewaythisprocesschangesfromonesitetothenext.BytreatingmolecularevolutionasacombinationoftwoMarkovprocesses—onethatoperatesinthedimensionofspace(alongagenome)andonethatoperatesinthedimensionoftime(alongthebranchesofaphylogenetictree)—thesemodelsallowaspectsofbothsequencestructureandsequenceevolutiontobecaptured.Moreover,aswewilldiscuss,theyper-mitkeycomputationstobeperformedexactlyandefficiently.Phylo-HMMsallowevolutionaryinformationtobebroughttobearonawidevarietyofproblemsofsequence“segmentation,”suchasgenepredictionandtheiden-tificationofconservedelements.Phylo-HMMswerefirstproposedasawayofimprovingphylogeneticmod-elsthatallowforvariationamongsitesintherateofsubstitution[9,52].Soonafterward,theywereadaptedfortheproblemofsecondarystructurepredic-tion[11,47],andsometimelaterforthedetectionofrecombinationevents[20].Recentlytherehasbeenarevivalofinterestinthesemodels[41,42,43,44,33],inconnectionwithanexplosionintheavailabilityofcomparativesequencedata,andanaccompanyingsurgeofinterestincomparativemethodsforthedetectionoffunctionalelements[5,3,24,46,6].Therehasbeenparticularinterestinapplyingphylo-HMMstoamultispeciesversionoftheabinitiogenepredictionproblem[41,43,33].Inthischapter,phylo-HMMsareintroduced,andexamplesarepresentedillustratinghowtheycanbeusedbothtoidentifyregionsofinterestinmul-tiplyalignedsequencesandtoimprovethegoodnessoffitofordinaryphylo-geneticmodels.Inaddition,wediscusshowhiddenMarkovmodels(HMMs),phylogeneticmodels,andphylo-HMMsallcanbeconsideredspecialcasesofgeneral“graphicalmodels”andhowthealgorithmsthatareusedwiththesemodelscanbeconsideredspecialcasesofmoregeneralalgorithms.Thischap-teriswrittenatatutoriallevel,suitableforreaderswhoarefamiliarwithphylogeneticmodelsbuthavehadlimitedexposuretootherkindsofgraphi-calmodels.
326A.SiepelandD.Haussler12.1BackgroundAphylo-HMMcanbethoughtofasamachinethatprobabilisticallygeneratesamultiplealignment,columnbycolumn,suchthateachcolumnisdefinedbyaphylogeneticmodel.Aswiththesingle-sequenceHMMsordinarilyusedinbiologicalsequenceanalysis[7],thismachineprobabilisticallyproceedsfromonestatetoanother1,andateachtimestepit“emits”anobservableob-ject,whichisdrawnfromthedistributionassociatedwiththecurrentstate(Figure12.1).Withphylo-HMMs,however,thedistributionsassociatedwithstatesarenolongermultinomialdistributionsoverasetofcharacters(e.g.,{A,C,G,T})butaremorecomplexdistributionsdefinedbyphylogeneticmod-.slePhylogeneticmodels,asconsideredhere,defineastochasticprocessofsub-stitutionthatoperatesindependentlyateachsiteinagenome.(Thequestionofindependencewillberevisitedbelow.)Intheassumedprocess,acharacterisfirstdrawnatrandomfromthebackgrounddistributionandassignedtotherootofthetree;charactersubstitutionsthenoccurrandomlyalongthetree’sbranchesfromroottoleaves.Thecharactersthatremainattheleaveswhentheprocesshasbeencompleteddefineanalignmentcolumn.Thus,aphyloge-neticmodelinducesadistributionoveralignmentcolumnshavingacorrela-tionstructurethatreflectsthephylogenyandsubstitutionprocess(see[11]).Thedifferentphylogeneticmodelsassociatedwiththestatesofaphylo-HMMmayreflectdifferentoverallratesofsubstitution(asinconservedandnoncon-servedregions),differentpatternsofsubstitutionorbackgrounddistributions(asincodingandnoncodingregions),orevendifferenttreetopologies(aswithrecombination[20]).TypicallywithHMMs,asequenceofobservations(heredenotedX)isavailabletobeanalyzed,butthesequenceofstates(calledthe“path”)bywhichtheobservationsweregeneratedis“hidden”(hencethename“hiddenMarkovmodel”).Efficientalgorithmsareavailabletocomputethemaximum-likelihoodpath,theposteriorprobabilitythatanygivenstategeneratedanygivenelementofX,andthetotalprobabilityofXconsideringallpossiblepaths(thelikelihoodofthemodel).TheusefulnessofHMMsingeneral,andphylo-HMMsinparticular,isinlargepartaconsequenceofthefactthatthesecomputationscanbeperformedexactlyandefficiently.Inthischapter,threeexamplesofapplicationsofphylo-HMMswillbepresentedthatpar-allelthesethreetypesofcomputation—predictionbasedonthemaximum-likelihoodpath(Example12.1),predictionbasedonposteriorprobabilities(Example12.2),andimprovedgoodnessoffit,asevidencedbymodellikeli-hood(Example12.3).Finally,itwillbeshownhowthesealgorithmsmaybeconsideredspecialcasesofmoregeneralalgorithmsbyregardingphylo-HMMsasgraphicalmodels.1Throughoutthischapter,itisassumedthattheMarkovchainforstatetransi-tionsisdiscrete,first-order,andhomogeneous.
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents