A Markovian Approach for the Analysis of the Gene Structure
14 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

A Markovian Approach for the Analysis of the Gene Structure

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
14 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
A Markovian Approach for the Analysis of the Gene Structure Christelle Melo de Lima1, Laurent Gueguen1, Christian Gautier1 and Didier Piau2 1UMR 5558 CNRS Biometrie et Biologie Evolutive, Universite Claude Bernard Lyon 1 43 boulevard du 11-Novembre-1918 69622 Villeurbanne Cedex 69622 – France. 2 Institut Camille Jordan UMR 5208, Universite Claude Bernard Lyon 1 Domaine de Gerland, 50 avenue Tony-Garnier 69366 Lyon Cedex 07 – France. Abstract. Hidden Markov models (HMMs) are effective tools to detect series of sta- tistically homogeneous structures, but they are not well suited to analyse complex structures. Numerous methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to sug- gest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex genes structures with bell-shaped distributed lengths, modelling them by macro-states. Our data processing method, based on discrimination between macro-states, allows to reveal several specific characteristics of intronless genes, and a break in the homogeneity of the initial coding exons. This potential use of markovian models to help in data exploration seems to have been underestimated until now, and one aim of our paper is to promote this use of Markov modelling.

  • macros-states

  • human genome

  • discrimination between

  • coding exons

  • hmms

  • structure can

  • markov models

  • exon hmm


Sujets

Informations

Publié par
Nombre de lectures 11
Langue English

Extrait

AMarkovianApproachfortheAnalysisoftheGeneStructureChristelleMelodeLima1,LaurentGue´guen1,ChristianGautier1andDidierPiau21UMR5558CNRSBiome´trieetBiologieEvolutive,Universite´ClaudeBernardLyon143boulevarddu11-Novembre-191869622VilleurbanneCedex69622–France.melo@biomserv.univ-lyon1.fr2InstitutCamilleJordanUMR5208,Universite´ClaudeBernardLyon1DomainedeGerland,50avenueTony-Garnier69366LyonCedex07–France.Abstract.HiddenMarkovmodels(HMMs)areeffectivetoolstodetectseriesofsta-tisticallyhomogeneousstructures,buttheyarenotwellsuitedtoanalysecomplexstructures.NumerousmethodologicaldifficultiesareencounteredwhenusingHMMstosegregategenesfromtransposonsorretroviruses,ortodeterminetheisochoreclassesofgenes.Theaimofthispaperistoanalysethesemethodologicaldifficulties,andtosug-gestnewtoolsfortheexplorationofgenomedata.WeshowthatHMMscanbeusedtoanalysecomplexgenesstructureswithbell-shapeddistributedlengths,modellingthembymacro-states.Ourdataprocessingmethod,basedondiscriminationbetweenmacro-states,allowstorevealseveralspecificcharacteristicsofintronlessgenes,andabreakinthehomogeneityoftheinitialcodingexons.Thispotentialuseofmarkovianmodelstohelpindataexplorationseemstohavebeenunderestimateduntilnow,andoneaimofourpaperistopromotethisuseofMarkovmodelling.Keywords:HMM,macro-state,genestructure,G+Ccontent1IntroductionThesequencingofthecompletehumangenomeledtotheknowledgeofasequenceofthreebillionpairsofnucleotides[19].Suchamountsofdatamakeitimpossibletoanalysepatternsortoprovideabiologicalinterpretationanalysisunlessonereliesonautomaticdata-processingmethods.Fortwentyyears,mathematicalandcomputa-tionalmodelshavebeenwidelydevelopedinthissetting.Numerousmethodologicaleffortshavebeendevotedtomulticellulareukaryotessincealargeproportionoftheirgenomehasnoknownfunction.Forexample,only3%ofthehumangenomeisknowntocodeforproteins.Anotherdifficultyisthatthestatisticalcharacteristicsofthecodingregionvarydramaticallyfromonespeciestotheother,andevenfromoneregioninagivengenometotheother.Forexample,vertebrateisochores([29],[3])exhibitsuchavariabilityinrelationtotheirG+Cfrequencies.Thusitisnecessarytousedifferentmodelsfordifferentregionsifoneseekstodetectpatternsingenomes.AclassicalwayofmodellinggenomesuseshiddenMarkovModels(HMMs)([22],[18],[23]).Toeachtypeofgenomicregion(exons,introns,etc.),oneassociatesastateofthehiddenprocess,andthedistributionofthestayinagivenstate,thatis,ofthelengthofaregion,isgeometric.Whilethisisindeedanacceptableconstraintasfarasintergenicregionsandintronsareconcerned,theempiricaldistributionsofthelengthsofexonsareclearlybell-shaped([6],[2],[17]),hencetheycannotberepresentedbygeometricaldistributions.Semi-Markovmodelsareoneoptiontoovercomethis
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents