La lecture à portée de main
Découvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDécouvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDescription
Sujets
Informations
Publié par | technische_universitat_munchen |
Publié le | 01 janvier 2010 |
Nombre de lectures | 12 |
Langue | English |
Poids de l'ouvrage | 1 Mo |
Extrait
Inference of Large Phylogenetic
Trees on Parallel Architectures
Michael OttTECHNISCHE UNIVERSITAT
MUNCHEN
Lehrstuhl fur Rechnertechnik und Rechnerorganisation /
Parallelrechnerarchitektur
Inference of Large Phylogenetic Trees on Parallel Architectures
Michael Ott
Vollst andiger Abdruck der von der Fakult at fur Informatik der Technischen Universit at
Munc hen zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
genehmigten Dissertation.
Vorsitzender: Univ.-Prof. Dr. H. M. Gerndt
Prufer der Dissertation: 1. Univ.-Prof. Dr. A. Bode
2. TUM Junior Fellow Dr. A. Stamatakis
Die Dissertation wurde am 15.07.2010 bei der Technischen Universit at Munc hen eingereicht und
durch die Fakult at fur Informatik am 08.10.2010 angenommen.Abstract
Due to high computational demands, the inference of large phylogenetic trees from molecular
sequence data requires the use of HPC systems in order to obtain the necessary computational
power and memory. The continuous explosive accumulation of molecular data, which is driven
by the development of cost-e ective sequencing techniques, ampli es this requirement addition-
ally. Furthermore, a continuously increasing degree of parallelism is necessary in order to exploit
the performance of emerging multi-core processors e ciently.
This dissertation describes scalable parallelization schemes for the inference of large phylo-
genetic trees as well as tangible implementations of those which also eliminate memory require-
ments as a limiting factor for phylogenetic analyses. Additionally, it pinpoints the properties of
current multi-core shared and distributed memory architectures and describes novel approaches
for their e cient exploitation.Acknowledgments
Many people have contributed to the success of this work. First of all, I would like to thank
Prof. Dr. Arndt Bode for providing such a supportive working environment at the Lehrstuhl
fur Rechnertechnik und Rechnerorganisation (LRR) and for the freedom he granted me for my
research over the last years. I am particularly grateful to Dr. Alexandros Stamatakis who has
been accompanying and supporting me since my undergraduate studies and spent a great e ort
on supervising my work. Furthermore, I am very thankful to the numerous helpful colleagues
at the LRR { too many to be named individually { who have o ered their advice when it was
needed and supported me in many di erent ways.
My position at the LRR was funded by KONWIHR, the Bavarian Competence Network
for Technical and Scienti c High Performance Computing. KONWIHR is an initiative by the
Bavarian State Ministry of Sciences, Research and the Arts for supporting research in compu-
tationally challenging scienti c domains that require the use of High Performance Computing
in order to obtain new knowledge and insights.Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scienti c Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Phylogenetic Tree Inference 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Models of Sequence Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 The Substitution Rate Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 The Probability Matrix . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Rate Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Distance Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 UPGMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Neighbor-Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.4 Objections Against Distance Methods . . . . . . . . . . . . . . . . . . . . 14
2.5 Maximum Parsimony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6um Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.1 The Phylogenetic Likelihood Function . . . . . . . . . . . . . . . . . . . . 16
2.6.2 The Pulley Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6.3 Optimization of the Likelihood Score . . . . . . . . . . . . . . . . . . . . . 21
2.7 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9 Programs for Phylogenetic Tree Inference . . . . . . . . . . . . . . . . . . . . . . 27
2.9.1 RAxML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.9.2 PHYLIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9.3 GARLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.9.4 MrBayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.9.5 PAUP* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30VIII CONTENTS
3 Shared Memory Multiprocessors 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Multi- and Many-Cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 The Power Wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 From Single- to Multi-Core . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 The Memory Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 The Memory Wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Memory Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 UMA and NUMA Architectures . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Shared Memory Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Distributed Memory on Shared Memory Architectures . . . . . . . 40
3.6 E cient Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6.1 Shared Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6.2 Dealing with the Memory Wall . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6.3 Thread Pinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 An Outlook on Automated Thread Pinning: autopin . . . . . . . . . . . . . . . 49
4 Distributed Memory Systems 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 The Message-Passing Interface MPI . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 MPI-1 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 The MPI Pro ling Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 An Outlook on Automated Performance Analysis . . . . . . . . . . . . . . . . . . 63
4.4.1 Automated Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.2 The Architecture of Periscope . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Identifying MPI Performance Properties . . . . . . . . . . . . . . . . . . . 67
5 Parallel Phylogenetic Tree Inference 71
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Sources of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.1 Embarrassing Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.2 Inference Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.3 Loop-level P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Parallel Implementations of RAxML . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.1 Exploitation of Loop-level Parallelism . . . . . . . . . . . . . . . . . . . . 75
5.3.2 PThreads Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.3 Simultaneous Exploitation of Loop-level and Embarrassing Parallelism
with MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84