Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin joint work with Randy Linder Kevin Liu Serita Nelesen and Sindhu Raghavan
25 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin joint work with Randy Linder Kevin Liu Serita Nelesen and Sindhu Raghavan

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
25 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin(joint work with Randy Linder, Kevin Liu,Serita Nelesen, and Sindhu Raghavan)

  • kevin liu

  • error rate

  • evolution aagactt

  • simultaneous estimation

  • false negative

  • dna sequence

  • mil yrs

  • tagccca tagactt

  • today agggcat


Sujets

Informations

Publié par
Nombre de lectures 26
Langue English

Extrait

Simultaneous estimation of
alignments and trees
Tandy Warnow
The University of Texas at Austin
(joint work with Randy Linder, Kevin Liu,
Serita Nelesen, and Sindhu Raghavan)DNA Sequence Evolution
-3 mil yrsAAGACTT
-2 mil yrs
AAGGCCT TGGACTT
-1 mil yrs
AGGGGGCAT TAGCCCT AGCACTT
todayAGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTTFN
FN: false negative
(missing edge)
FP: false positive
(incorrect edge)
FP
50% error rateDeletion Mutation
…ACGGTGCAGTTACCA…
…ACCAGTCACCA…
indels (insertions and deletions) also
occur!Input: unaligned sequences
S1 = AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC
S3 = TAGCTGACCGC
S4 = TCACGACCGACAPhase 1: Multiple Sequence
Alignment
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACAPhase 2: Construct tree
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACA
S1 S2
S3S4DNA sequence evolution
Simulation using ROSE: 100 taxon model trees, models 1-4 have “long gaps”,
and 5-8 have “short gaps”, site substitution is HKY+GammaSimultaneous estimation?
• Statistical methods (e.g., AliFritz and
BaliPhy) cannot be applied to datasets
above ~20 sequences.
• POY attempts to solve the NP-hard
“minimum treelength” problem, and can
be applied to larger datasets.POY vs. Clustal
• Ogden and Rosenberg did a simulation study
showing POY 3.0 alignments (using simple
gap penalties) were less accurate than
Clustal alignments on over 99% of the
datasets they generated.
• Simple gap penalties are of the form
gapcost(L)=cL for some constant c

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents