Phylogenetic trees from large datasets [Elektronische Ressource] / vorgelegt von Heiko A. Schmidt
123 pages
English

Phylogenetic trees from large datasets [Elektronische Ressource] / vorgelegt von Heiko A. Schmidt

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
123 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Phylogenetic Trees from Large DatasetsInaugural – DissertationzurErlangung des Doktorgrades derMathematisch–Naturwissenschaftlichen Fakult atder Heinrich–Heine–Universitat Dusse ldorfvorgelegt vonHeiko A. Schmidtaus BonnDusseldorf2003Gedruckt mit der Genehmigung der Mathematisch-NaturwissenschaftlichenFakult at der Heinrich-Heine-Universitat DusseldorfReferent: Prof. Dr. Arndt von HaeselerKorreferent: Prof. Dr. William MartinTag der mundlic hen Prufung: 28.07.2003The DachshundIndeed, it was made on the plan of a bench for length and lowness. It seemedto be satis ed, but I thought the plan poor, and structurally weak, on accountof the distance between the forward supports and those abaft. With age thedog’s back was likely to sag; and it seemed to me that it would have been astronger and more practicable dog if it had had some more legs.Mark Twain (Following the Equator, 1895-1896)Biologists must constantly keep in mind that what they see was not designed,but rather evolved.Francis Crick (What Mad Pursuit, 1988)ivPrefaceThisthesisdealswithtopicsfromBioinformatics/Phyloinformatics. Ittouchesthe eldsalgorithmic complexity, molecular phylogenetics, sequence evolution, as well as parallelcomputing and, thus, is addressed to an interdisciplinary community.

Sujets

Informations

Publié par
Publié le 01 janvier 2003
Nombre de lectures 7
Langue English

Extrait

Phylogenetic Trees from Large Datasets
Inaugural – Dissertation
zur
Erlangung des Doktorgrades der
Mathematisch–Naturwissenschaftlichen Fakult at
der Heinrich–Heine–Universitat Dusse ldorf
vorgelegt von
Heiko A. Schmidt
aus Bonn
Dusseldorf
2003Gedruckt mit der Genehmigung der Mathematisch-Naturwissenschaftlichen
Fakult at der Heinrich-Heine-Universitat Dusseldorf
Referent: Prof. Dr. Arndt von Haeseler
Korreferent: Prof. Dr. William Martin
Tag der mundlic hen Prufung: 28.07.2003The Dachshund
Indeed, it was made on the plan of a bench for length and lowness. It seemed
to be satis ed, but I thought the plan poor, and structurally weak, on account
of the distance between the forward supports and those abaft. With age the
dog’s back was likely to sag; and it seemed to me that it would have been a
stronger and more practicable dog if it had had some more legs.
Mark Twain (Following the Equator, 1895-1896)
Biologists must constantly keep in mind that what they see was not designed,
but rather evolved.
Francis Crick (What Mad Pursuit, 1988)iv
Preface
ThisthesisdealswithtopicsfromBioinformatics/Phyloinformatics. Ittouchesthe elds
algorithmic complexity, molecular phylogenetics, sequence evolution, as well as parallel
computing and, thus, is addressed to an interdisciplinary community. Although it is
hardly possible to write a completely self-contained thesis, I tried to provide some basic
information needed for members of either community to be able to more easily under-
standthetopicsdiscussed. Consequently,someinformationmightbeenclosedthatseem
trivial to, e.g., computer scientists but which biologists are possibly not familiar with
and vice versa.
When writing up my thesis I decided, as it is custom in scienti c presentations, to use
the scienti c ’we’, which I prefer over the personal pronoun ’I’.
Parts of this thesis have been published in the following articles:
1. H. A. Schmidt, K. Strimmer, M. Vingron, and A. von Haeseler (2002) TREE-
PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel
computing. Bioinformatics, 18, 502-504.
2. H. A. Schmidt and A. von Haeseler (2002) Quartet Trees as a Tool to Reconstruct
Large Trees from Sequences. In K. Jajuga, A. Sokolowski, and H.-H. Bock (eds.),
Data Analysis, Classi cation, and Related Methods, Studies in Classi cation, Data
Analysis, and Knowledge Organization, pages 379-388, Springer, Heidelberg, New
York.
3. H. A. Schmidt and A. von Haeseler (2003) Maximum Likelihood Analysis Using
TREE-PUZZLE. In A. D. Baxevanis, D. B. Davison, R. D. M. Page, G. Stormo,
and L. Stein (eds.), Current Protocols in Bioinformatics, Unit 6.6, pages 6.6.1-
6.6.25, Wiley and Sons, New York.
4. H. A. Schmidt, E. Petzold, M. Vingron, and A. von Haeseler (2003) Molecular
Phylogenetics: Parallelized Parameter Estimation and Quartet Puzzling. J. Par-
allel Distrib. Comput., 63, in press.
The TREE-PUZZLE package including developments presented in this thesis is freely
available from http://www.tree-puzzle.de.Contents
1. Overview 1
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Molecular Phylogenetics: A General Introduction 5
2.1. Biological Sequences and Molecular Evolution . . . . . . . . . . . . . . . 5
2.1.1. Types of Biological Data . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2. Sequence Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3. Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . . . 7
2.1.4. Modeling Molecular Evolution . . . . . . . . . . . . . . . . . . . . 7
2.2. Phylogenetic Tree Reconstruction . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1. Notation on Phylogenetic Trees . . . . . . . . . . . . . . . . . . . 10
2.2.2. Types of Phylogenetic Methods . . . . . . . . . . . . . . . . . . . 14
2.2.3. Maximum Likelihood on Phylogenetic Trees . . . . . . . . . . . . 14
2.2.4. Complexity of Phylogenetic Analysis . . . . . . . . . . . . . . . . 17
3. The Quartet Puzzling Algorithm 19
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2. Methods from the TREE-PUZZLE Package . . . . . . . . . . . . . . . . 20
3.2.1. Likelihood Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2. Quartet Puzzling . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2.1. ML Step . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2.2. Puzzling Step . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2.3. Consensus Step . . . . . . . . . . . . . . . . . . . . . . . 24
4. Improvement in Complexity of the Puzzling Step of QP 25
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2. Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.y of the Original Puzzling Step . . . . . . . . . . . . . . . . . . 27
4.4. More E cient Puzzling Step Algorithms . . . . . . . . . . . . . . . . . . 29
4.4.1. Split-Based Puzzling Step Algorithm . . . . . . . . . . . . . . . . 29
4.4.2. Recursive Puzzling Step Algorithm . . . . . . . . . . . . . . . . . 31
4.4.2.1. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.2.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 34
vvi Contents
4.4.3. Berry’s MRCA-Based Puzzling Step Algorithm . . . . . . . . . . . 36
4.5. Runtime Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.1. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.2. Benchmark Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5. Parallelized Quartet Puzzling 43
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2. Parameter Estimation for Evolutionary Models. . . . . . . . . . . . . . . 44
5.2.1. Algorithm: Estimating Model Parameters . . . . . . . . . . . . . 44
5.3. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4. Runtime Analysis of the Sequential TREE-PUZZLE . . . . . . . . . . . . 46
5.4.1. Runtime of the Parameter Estimation. . . . . . . . . . . . . . . . 46
5.4.2. Runtime of the Quartet Puzzling Algorithm . . . . . . . . . . . . 46
5.5. Parallelizing TREE-PUZZLE . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5.1. Parallelizing the Parameter Estimation . . . . . . . . . . . . . . . 47
5.5.2. Parallelizing Quartet Puzzling . . . . . . . . . . . . . . . . . . . . 47
5.6. Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.7. E ciency of Parallel TREE-PUZZLE . . . . . . . . . . . . . . . . . . . . 50
5.7.1. Benchmark Datasets and Setup . . . . . . . . . . . . . . . . . . . 50
5.7.2. Results for Parameter Estimation . . . . . . . . . . . . . . . . . . 51
5.7.3. ML Step and Puzzling Step . . . . . . . . . . . . . . . . . . . . . 51
5.7.4. Overall Scaling of Parallel TREE-PUZZLE . . . . . . . . . . . . . 51
5.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6. Large ML Trees from Sequences Using Quartets 57
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2. The Modi ed Quartet Puzzling Algorithm . . . . . . . . . . . . . . . . . 57
6.2.1. The Algorithm ModPUZZLE . . . . . . . . . . . . . . . . . . . 58
6.3. Computational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.4. Some Practical Measures on the Method . . . . . . . . . . . . . . . . . . 60
6.5. Discussion and Possible Extensions . . . . . . . . . . . . . . . . . . . . . 61
7. Phylogenetic Trees from Multiple Genesets with Missing Data 63
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2. Methods to Combine Datasets . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2.1. Low Level Combination: Total Evidence . . . . . . . . . . . . . . 64
7.2.2. High Level Methods . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2.2.1. Combining Equal-Sized Datasets: Consensus Methods . 64
7.2.2.2. Com Overlapping Datasets: Supertreeds . . 66
7.3. Medium Level Combined Phylogenetic Analysis . . . . . . . . . . . . . . 68
7.3.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.3.2. The Combined Quartet Method to Combine Genesets . . . . . . . 70Contents vii
7.3.2.1. Combining the ML Quartets . . . . . . . . . . . . . . . . 70
7.3.2.2. Computing the Overall Tree . . . . . . . . . . . . . . . . 71
7.3.2.3. Assessing Whether Genesets Can Be Combined . . . . . 71
7.3.2.4. Overlap-Guided Puzzling Step . . . . . . . . . . . . . . . 72
7.3.2.5. Relative Majority Consensus . . . . . . . . . . . . . . . 73
7.4. The Phylogeny of the Grasses . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4.1. The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.4.2.1. Low Level Combination . . . . . . . . . . . . . . . . . . 74
7.4.2.2. High Level Combination . . . . . . . . . . . . . . . . . . 74
7.4.2.3. Medium Level Combina

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents