9 pages

English

SC2005-tutorial

Nifor

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

9 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Computational Grand Challenges in Assembling the Tree of Life:Problems & Solutions∗David A. Bader Usman RoshanUniversity of New Mexico New Jersey Institute of TechnologyElectrical & Computer Engineering Department Computer Science DepartmentAlbuquerque, NM 87131 USA Newark, NJ 07102 USAdbader@ece.unm.edu usman@cs.njit.edu†Alexandros StamatakisFoundation for Research and Technology-HellasInstitute of Computer ScienceHeraklion, GR 71110 Greecestamatak@ics.forth.grApril 21, 2005Audience: 40% Introductory, 40% Intermediate, 20% AdvancedAbstractThe computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with theultimate goal to compute the tree of life represents one of the grand challenges in High PerformanceComputing (HPC) Bioinformatics.Unfortunately, the size of trees which can be computed in reasonable time based on elaborate evo-lutionary models is limited by the severe computational cost inherent to these methods. There existtwo orthogonal research directions to overcome this challenging computational burden: First, the de-velopment of novel, faster, and more accurate heuristic algorithms and second, the application of highperformance computing techniques.The goal of this tutorial is to provide a comprehensive introduction to the ﬁeld of computationalevolutionary biology to an audience with computing background, interested in participating in researchand/or commercial applications of this ﬁeld. Moreover, we will ...

Informations

Publié par	Nifor
Nombre de lectures	20
Langue	English

Extrait

Computational

Grand Challenges in Assembling the Tree of Life: Problems & Solutions

∗ David A. Bader University of New Mexico Electrical & Computer Engineering Department Albuquerque, NM 87131 USA dbader@ece.unm.edu

UsmanRoshan New Jersey Institute of Technology ComputerScienceDepartment Newark, NJ 07102 USA usman@cs.njit.edu

† Alexandros Stamatakis Foundation for Research and TechnologyHellas Institute of Computer Science Heraklion, GR 71110 Greece stamatak@ics.forth.gr

April 21, 2005

Audience: 40% Introductory, 40% Intermediate, 20% Advanced

Abstract The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute thetree of liferepresents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in reasonable time based on elaborate evo-lutionary models is limited by the severe computational cost inherent to these methods. There exist two orthogonal research directions to overcome this challenging computational burden:First, the de-velopment of novel, faster, and more accurate heuristic algorithms andsecond, the application of high performance computing techniques. The goal of this tutorial is to provide a comprehensive introduction to the ﬁeld of computational evolutionary biology to an audience with computing background, interested in participating in research and/or commercial applications of this ﬁeld. Moreover, we will cover leading-edge technicalandalgo-rithmic developments in the ﬁeld and discuss open problems and potential solutions. No background in biology is assumed.

Audience

This tutorial is intended for computer professionals, software developers, researchers, educators and graduate students interested in conducting research or developing software systems in high performance computing evolutionary biology. ∗ This work was supported in part by NSF Grants CAREER ACI0093039, ITR ACI0081404, ITR EIA0121377, Biocom plexity DEB0120709, and ITR EF/BIO 0331654; and DARPA contract NBCH30390004. † This work was supported in part by a Postdocfellowship granted by the German Academic Exchange Service (DAAD).

Course Description

We describe the exciting algorithmic and technical challenges which need to be overcome in order to super-compute the tree of life. We particularly emphasize that phylogenetics are both a technicalas well as an algorithmic discipline. D. Bader who is an expert in the area of designing and implementing high-performance and parallel algorithms for problems arising in computational biology is joined by two young researchers U. Roshan and A. Stamatakis who are experts in the ﬁeld of algorithmic and high performance computing phylogenetics. Bader and Roshan are core members of the Cyberinfrastructure for Phylogenic Research Project (CIPReS, www.phylo.org), an open collaboration and national resource for phyloinformatics and computational phylo-genetics funded by the National Science Foundation, with the project goal of enabling large-scale phylogenetic reconstructions on a scale that will enable analyses of huge datasets containing hundreds of thousands of biomolecular sequences.

Introduction to Phylogenetics

A phylogeny is a representation of the evolutionary history of a collection of organisms or genes (known as taxa). The basic assumption of process necessary to phylogenetic reconstruction is repeated divergence within species or genes. A phylogenetic reconstruction is usually depicted as a tree, in which modern taxa are depicted at the leaves and ancestral taxa occupy internal nodes, with the edges of the tree denoting evolutionary relationships among the taxa. Reconstructing phylogenies is a major component of modern research programs in biology and medicine (as well as linguistics). Naturally, scientists are interested in phylogenies for the sake of knowledge, but such analyses also have many uses in applied research and in the commercial arena [12]. Existing phylogenetic reconstruction techniques suﬀer from serious problems of running time (or, when fast, of accuracy). The problem is particularly serious for large data sets: even though data sets comprised of sequence from a single gene continue to pose challenges (e.g., some analyses are still running after two years of computation on medium-sized clusters), using whole-genome data (such as gene content and gene order) gives rise to even more formidable computational problems, particularly in data sets with large numbers of genes and highly-rearranged genomes. To date, almost every model of speciation and genomic evolution used in phylogenetic reconstruction has given rise to NP-hard optimization problems. Three major classes of methods are in common use. Heuristics (a natural consequence of the NP-hardness of the problems) run quickly, but may oﬀer no quality guarantees and may not even have a well-deﬁned optimization criterion, such as the popularneighbor-joiningheuristic [38]. Optimization based on the criterion ofmaximum parsimony(MP) [21] seeks the phylogeny with the least total amount of change needed to explain modern data. Finally, optimization based on the criterion of maximum likelihood(ML) [22] seeks the phylogeny that is the most likely to have given rise to the modern data. Heuristics are fast and often rival the optimization methods in terms of accuracy, at least on datasets of moderate size, e.g. on hundreds of taxa. Parsimony-based methods may take exponential time, but, at least for DNA and amino acid data, can often be run to completion on datasets of moderate size. Methods based on maximum likelihood are very slow (the point estimation problem alone appears intractable) and thus restricted to very small instances, and also require many more assumptions than parsimony-based methods, but appear capable of outperforming the others in terms of the quality of solutions when these assumptions are met. Both MP- and ML-based analyses are often run with various heuristics to ensure timely termination of the computation, with mostly unquantiﬁed eﬀects on the quality of the answers returned. Thus there is ample scope for the application of high-performance computing in the area. As in all scientiﬁc computing areas, biologists want to study a particular dataset and are willing to spend months and even years in the process: accurate branch prediction is the main goal. However, since all exact algorithms scale exponentially (or worse, in the case of ML approaches) with the number of taxa, speed remains a crucial parameter—otherwise few datasets of more than a few dozen taxa could ever be analyzed.

3.1 GeneOrder Phylogeny As an illustration, we brieﬂy discuss our experience with a high-performance software suite, GRAPPA (Genome Rearrangement Analysis through Parsimony and other Phylogenetic Algorithms) that we de-veloped,GRAPPAextends Sankoﬀ and Blanchette’s breakpoint phylogeny algorithm [39] into the more biologically-meaningful inversion phylogeny and provides a highly-optimized code that can make use of distributed- and shared-memory parallel systems (see [12, 9, 27, 29, 25, 51] for details). In [13] we give the ﬁrst linear-time algorithm and fast implementation for computing inversion distance between two signed permutations. We ranGRAPPAon a 512-processor IBM Linux cluster with Myrinet and obtained a 512-fold speed-up (linear speedup with respect to the number of processors): a complete breakpoint analysis (with the more demanding inversion distance used in lieu of breakpoint distance) for the 13 genomes in the Campanulaceae data set ran in less than 1.5 hours in an October 2000 run, for amillion-foldspeedup over the original implementation. Our latest version features signiﬁcantly improved bounds and new distance correction methods and, on the same dataset, exhibits a speedup factor ofover one billionachieved. We this speedup through a combination of parallelism and high-performance algorithm engineering. Although such spectacular speedups will not always be realized, we suggest that many algorithmic approaches now in use in the biological, pharmaceutical, and medical communities can beneﬁt tremendously from such an application of high-performance techniques and platforms. This example indicates the potential of applying supercomputing techniques to applications in computa-tional biology, especially in areas that involve complex optimizations: our reimplementation did not require new algorithms or entirely new techniques, yet achieved gains that turned an impractical approach into a usable one.

3.2 Problems & Solutions for Maximum Likelihood This part will focus on algorithmic and technical solutions and open problems concerning the broadly-accepted maximum likelihood model of evolution in particular. Within this context some of the fastest and most accurate current heuristic search algorithms and techniques will be presented. Furthermore, the most recent technical solutions such as non-deterministic message passing implementations, ﬁne-grained OpenMP parallelizations with superlinear speedups (due to increased memory eﬃciency) and JAVA-based distributed systems will be discussed. In addition, the interdependence and beneﬁts of simultaneous algorithmic and technical development will be outlined by example of RAxML-V [41, 44]. Finally, an overview over possible future HPC implementations of the novel algorithms is provided in-cluding Grid-based solutions, mixed MPI/OpenMP implementations for hybrid supercomputer architectures, and exploitation of vector-like peripheral processors, e.g. Graphics Processing Units (GPUs).

Introduction to Maximum Likelihood Associated computational problems Optimizing the likelihood function Hill climbing search algorithms Simulated Annealing Genetic Algorithms Problems with memory consumption Message-passing implementations Shared-memory implementations Grid and distributed implementations Future parallel & distributed systems and algorithms

3.3 Divide and conquer techniques The Disk-Covering Method (DCM) is a divide-and-conquerboostertechnique for phylogenetic reconstruction. It divides the input set of sequences into smaller subsets, applies an existingbasemethod to infer subtrees on the subsets (this is the method which is being boosted, i.e, improved upon), and then merges the subtrees to get a tree on the full dataset. We will present Recursive-Iterative-DCM3 (Rec-I-DCM3), the latest method in the family of DCMs. Rec-I-DCM3 is designed to improve upon heuristics for maximum parsimony and the maximum likelihood, both of which are widely used NP-hard optimization criteria for phylogeny reconstruction. We will also discuss recent and ongoing eﬀorts to parallelize Rec-I-DCM3. Our presentations will be accompanied by experimental results of recent and ongoing work of Rec-I-DCM3 on real biological data. We will show real-time improvements of existing heuristics for solving maximum parsimony and maximum likelihood. Our presentation will roughly follow the outline below.

Introduction to maximum parsimony and maximum likelihood Current heuristics for MP and ML Disk-Covering Methods Recursive-Iterative-DCM3 Parallel Recursive-Iterative-DCM3

Brief CVs

David A. Baderis an Associate Professor and Regents’ Lecturer in the Departments of Electrical and Computer Engineering and Computer Science at The University of New Mexico (UNM). He received his Ph.D. in Electrical Engineering in 1996 from The University of Maryland, and was awarded a National Science Foundation (NSF) Postdoctoral Research Associateship in Experimental Computer Science before joining UNM in 1998. He is an NSF CAREER Award recipient, an investigator on several NSF awards, a distinguished speaker in the IEEE Computer Society Distinguished Visitors Program, and is a member of the IBM PERCS team for the DARPA High Productivity Computing Systems program. Dr. Bader serves on the Steering Committees of the IPDPS and HiPC conferences, and is the General co-Chair for IPDPS (2004–2005), and Vice General Chair for HiPC (2002–2004). David has previously chaired several conference program committees, is the Program Chair for HiPC 2005, and Program Vice-Chair for IPDPS 2006 and for ICPP 2006. He has served on numerous conference program committees related to parallel processing, is an associate editor for the IEEE Transactions on Parallel and Distributed Systems and the ACM Journal of Experimental Algorithmics, a Senior Member of the IEEE Computer Society, and a Member of the ACM. Dr. Bader has been a pioneer the ﬁeld of high-performance computing for problems in bioinformatics and computational genomics. He has co-chaired a series of meetings, the IEEE International Workshop on High-Performance Computational Biology (HiCOMB), written several book chapters, and co-edited a special issue of the Journal of Parallel and Distributed Computing on High-Performance Computational Biology. He has co-authored over 65 articles in peer-reviewed journals and conferences, and his main areas of research are in parallel algorithms, combinatorial optimization, and computational biology and genomics. His web page ∼ with full CV, publications, and software, is available athttp://www.ece.unm.edu/ dbader. In [13] we give theﬁrst linear-time algorithm and fast implementationfor computing inversion distance between two signed permutations. We use this calculation in our conﬁrmation of the conjecture of Robert Jansen, University of Texas-Austin, that inversion is the principal process of genome evolution in chloroplast DNA, using a dataset from Campanulaceae (bluebell ﬂower) for analysis (see [10, 12, 9, 27, 29, 25, 51]). Not only did our research and software implementationimprove the running time from CENTURIES to MINUTES for computing this problem, but the results aremore biologically-meaningfulthan prior meth-ods [23, 33]. The analysis used our high-performance software suite, called GRAPPA, that reconstructs evolutionary histories using gene-order data. In the analysis of the chloroplast genomes for the 12 species of

bluebell ﬂowers, we ran GRAPPA on University of New Mexico’s 512-processor NSF Linux SMP superclus-ter, and achieved a one-million-fold speedup over prior methods—a speed-up that he has since increased to over one billion. Representative Publications:[1, 7, 8, 11, 12, 14, 13, 9, 15, 16, 6, 2, 3, 4, 5, 18, 20, 19, 23, 24, 26, 27, 29, 25, 33, 34, 40, 48, 49, 50]. Bader has copresented several tutorials at prior SC meetings: at SC1999 (highperformance computing in the Alliance), at SC2000 (an updated tutorial with the same focus and a second tutorial on parallel programming for cluster computers), and at SC2002 (opportunities and challenges in computational biology).

Usman Roshanis an assistant professor in the computer science department at the New Jersey Institute of Technology. He received his Ph.D in Computer Science from The University of Texas at Austin in May 2004. He is a member of the CIPRES project (http://www.phylo.org) which is a multi-million and multi-institutional project funded by the National Science Foundation of the United States to develop algorithms and software for reconstructing the evolutionary Tree of Life. His research work is mainly focused on computational approaches for accurate and fast reconstruction of large phylogenetic trees. On this topic he has one peer-reviewed book chapter, seven peer-reviewed conference publications, and several papers in preparation. He is also one of the main authors and the maintainer of the Recursive-Iterative-DCM3 open-source software distribution (http://www.cs.njit.edu/usman/RecIDCM3.html) which implements the Rec-I-DCM3 divide-and-conquer technique for reconstructing very large phylogenetic trees. This technique will also be available under the CIPRES project software distribution. Usman’s publications can be found at his web page athttp://www.cs.njit.edu/usman. Representative Publications:[37, 35, 36, 30, 31, 28, 32, 17, 37].

Alexandros Stamatakisreceived his Ph.D. in Computer Science from the Technical University of Munich (Germany) in October 2004. He currently works as postdoctoral researcher at the Institute of Computer Science of the Foundation for Research and Technology Hellas in Heraklion (Greece). His main research focus is on parallel and distributed systems and algorithms for computation of large phylogenetic trees based on statistical models of evolution. He is the author of 1 book chapter, 4 journal papers, and 9 peer-reviewed con-ference papers covering various technical as well as algorithmic aspects of phylogenetic inference. Moreover, he has held 21 talks and guest lectures at various international conferences, research institutes, as well as universities.Finally—jointly with his colleague Thomas Ludwig—he has presented two halfday tutorials on High Performance Computing Bioinformatics at the 4th International Conference on Bioinformatics and Genome Regulation and Structure (BGRS2004) in Novosibirsk, Russia (July 2004) and at the German Conference on Bioinformatics (GCB 2004) in Bielefeld, Ger many, (October 2004). The participants’ evaluations from these tutorials were very positive. The slides and evaluations by participants of the aforementioned tutorials are available on-line along with ∼ the complete list of publications athttp://www.ics.forth.gr/ stamatak. Representative Publications:[44, 41, 42, 43, 47, 46, 45].

Bader, Roshan, and Stamatakis, are close collaborators.As previously mentioned, Bader, Roshan, and Stamatakis, are close professional colleagues, who have a track record of collaborative work and ongo-ing collaborative projects. Despite the distances between their respective institutions, these three tutorial presenters see each other face-to-face several times a year. Roshan and Stamatakis are closely cooperating on the integration of Rec-I-DCM3 [37] with RAxML-V [41, 44, 46] and on the development of novel divide-and-conquer and parallel algorithms for phylogeny reconstruction. Stamatakis and Bader are collaborating on analysis techniques for the accuracy of large-scale phylogenetic reconstructions, and Stamatakis has pre-sented several papers at the Workshop on High Performance Computational Biology (HiCOMB), chaired by Bader. Bader and Roshan are core members of the NSF CIPReS Project and are presently working together on parallel optimization techniques for improving large-scale phylogenetic reconstructions.

References

[1] S. Aluru and D.A. Bader. Tutorial: Opportunities and challenges in computational biology. InProc. Supercomputing (SC 2002), Baltimore, MD, November 2002.

[2] D. A. Bader. An improved, randomized algorithm for parallel selection with an experimental study. Journal of Parallel and Distributed Computing, 64(9):1051–1059, 2004.

[3] D. A. Bader and G. Cong. A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs). InProc. Int’l Parallel and Distributed Processing Symp. (IPDPS 2004), Santa Fe, NM, April 2004.

[4] D. A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. InProc. Int’l Parallel and Distributed Processing Symp. (IPDPS 2004), Santa Fe, NM, April 2004.

[5] D. A. Bader and S. Sreshta. A new parallel algorithm for planarity testing. Journal Submission, October 2003.

[6] D.A. Bader. High-performance computing for computational biology. Communications of the ACM, 2004.

[7] D.A. Bader, G. Cong, and J. Feo. A comparison of the performance of list ranking and connected components algorithms on SMP and MTA shared-memory systems. Technical report, Electrical and Computer Engineering Department, The University of New Mexico, Albuquerque, NM, October 2004. Submitted for publication.

[8] D.A. Bader, A.K. Illendula, B. M.E. Moret, and N. Weisse-Bernstein. Using PRAM algorithms on a uniform-memory-access shared-memory architecture. In G.S. Brodal, D. Frigioni, and A. Marchetti-Spaccamela, editors,Proc. 5th Int’l Workshop on Algorithm Engineering (WAE 2001), volume 2141 of ˚ Lecture Notes in Computer Science, pages 129–144, Arhus, Denmark, 2001. Springer-Verlag.

[9] D.A. Bader, B. M.E. Moret, T. Warnow, S.K. Wyman, and M. Yan. High-performance algorithm engi-neering for gene-order phylogenies. InDIMACS Workshop on Whole Genome Comparison, Piscataway, NJ, March 2001. Rutgers University.

[10] D.A. Bader and B.M.E. Moret. GRAPPA runs in record time.HPCwire, 9(47), November 23 2000.

[11] D.A. Bader, B.M.E. Moret, and P. Sanders. Algorithm engineering for parallel computation. In R. Fleis-cher, E. Meineche-Schmidt, and B.M.E. Moret, editors,Experimental Algorithmics, volume 2547 of Lecture Notes in Computer Science, pages 1–23. Springer-Verlag, 2002.

[12] D.A. Bader, B.M.E. Moret, and L. Vawter. Industrial applications of high-performance computing for phylogeny reconstruction. In H.J. Siegel, editor,Proc. SPIE Commercial Applications for High-Performance Computing, volume 4528, pages 159–168, Denver, CO, 2001. SPIE.

[13] D.A. Bader, B.M.E. Moret, and M. Yan. A linear-time algorithm for computing between signed permutations with an experimental study.Journal of Computational 491, 2001.

inversion distance Biology, 8(5):483–

[14] D.A. Bader, B.M.E. Moret, and M. Yan. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. InProc. 7th Int’l Workshop on Algorithms and Data Structures (WADS 2001), volume 2125 ofLecture Notes in Computer Science, pages 365–376, Providence, RI, 2001. Springer-Verlag.

[15] D.A. Bader and R. Pennington. Cluster computing: Applications.Int’l J. High Performance Computing Applications, 15(2):181–185, 2001.

[16] D.A. Bader, S. Sreshta, and N. Weisse-Bernstein. Evaluating arithmetic expressions using tree contrac-tion: A fast and scalable parallel implementation for symmetric multiprocessors (SMPs). In S. Sahni, V.K. Prasanna, and U. Shukla, editors,Proc. 9th Int’l Conf. on High Performance Computing (HiPC 2002), volume 2552 ofLecture Notes in Computer Science, pages 63–75, Bangalore, India, December 2002. Springer-Verlag.

[17] C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, L. Nakhleh, and U. Roshan. PRec-I-DCM3: A parallel framework for fast and accurate large scale phylogeny reconstruction. InThe 1st IEEE Workshop on High Performance Computing in Medicine and Biology (HiPCoMP 2005), 2005. to appear.

[18] G. Cong and D. A. Bader. The Euler tour technique and parallel rooted spanning tree. InProc. Int’l Conf. on Parallel Processing (ICPP), pages 448–457, Montreal, Canada, August 2004.

[19] G. Cong and D.A. Bader. An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs). Technical report, Electrical and Computer Engineering Department, The University of New Mexico, Albuquerque, NM, October 2004. Submitted for publication.

[20] G. Cong and D.A. Bader. Lock-free parallel algorithms: An experimental study. InProc. 11th Int’l Conf. on High Performance Computing (HiPC 2004), Bangalore, India, December 2004. Springer-Verlag.

[21] J.S. Farris. The logical basis of phylogenetic analysis. In N.I. Platnick and V.A. Funk, editors,Advances in Cladistics, pages 1–36. Columbia Univ. Press, New York, 1983.

[22] J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood approach.J. Mol. Evol., 17:368–376, 1981.

[23] R.K. Jansen, D.A. Bader, B. M.E. Moret, L.A. Raubeson, L.-S. Wang, T. Warnow, and S. Wyman. New approaches for using gene order data in phylogeny reconstruction. InProc. Botany, Albuquerque, NM, August 2001.

[24] T. Liu, B.M.E. Moret, and D. A. Bader. An exact, linear-time algorithm for computing genomic distances under inversions and deletions. Conference Submission, November 2003.

[25] B. M.E. Moret, D.A. Bader, T. Warnow, S.K. Wyman, and M. Yan. GRAPPA: a high-performance computational tool for phylogeny reconstruction from gene-order data. InProc. Botany, Albuquerque, NM, August 2001.

[26] B.M.E. Moret, D.A. Bader, and T. Warnow. High-performance algorithm engineering for computational phylogenetics. InProc. Int’l Conf. on Computational Science, volume 2073–2074 ofLecture Notes in Computer Science, San Francisco, CA, 2001. Springer-Verlag.

[27] B.M.E. Moret, D.A. Bader, and T. Warnow. High-performance algorithm engineering for computational phylogenetics.J. Supercomputing, 22:99–111, 2002. Special issue on the best papers from ICCS’01.

[28] B.M.E. Moret, U. Roshan, and T. Warnow. Sequence length requirements for phylogenetic methods. InProc. 2nd Int’l Workshop Algorithms in Bioinformatics (WABI’02), volume 2452 ofLecture Notes in Computer Science, pages 343–356. Springer-Verlag, 2002.

[29] B.M.E. Moret, S. Wyman, D.A. Bader, T. Warnow, and M. Yan. A new implementation and detailed study of breakpoint analysis. InProc. 6th Paciﬁc Symp. Biocomputing (PSB 2001), pages 583–594, Hawaii, 2001.

[30] L. Nakhleh, B.M.E. Moret, U. Roshan, K. St. John, and T. Warnow. The accuracy of fast phylogenetic methods for large datasets. InProc. 7th Paciﬁc Symp. Biocomputing (PSB’2002), pages 211–222. World Scientiﬁc Pub., 2002.

[31] L. Nakhleh, U. Roshan, K. St. John, J. Sun, and T. Warnow. Designing fast converging phylogenetic methods. InProc. 9th Int’l Conf. on Intelligent Systems for Molecular Biology (ISMB’01), volume 17 ofBioinformatics, pages S190–S198. Oxford U. Press, 2001. [32] L. Nakhleh, U. Roshan, K. St. John, J. Sun, and T. Warnow. The performance of phylogenetic methods on trees of bounded diameter. InProc. 1st Int’l Workshop Algorithms in Bioinformatics (WABI’01), volume 2149 ofLecture Notes in Computer Science, pages 214–226. Springer-Verlag, 2001.

[33] L.A. Raubeson, D.A. Bader, B. M.E. Moret, L.-S. Wang, T. Warnow, and S.K. Wyman. Inferring phylogenies of photosynthetic organisms from chloroplast gene orders. InProc. Botany, Albuquerque, NM, August 2001. [34] C. Restrepo, B.T. Milne, D.A. Bader, W. Pockman, and A. Kerkhoﬀ. Variation in vegetation growth rates: Implications for the evolution of semi-arid landscapes. InProc. 16th Ann. Symp. of the US-Int’l Assoc. of Landscape Ecology, Tempe, AZ, April 2001. [35] U. Roshan.Algorithmic techniques for improving the speed and accuracy of phylogenetic methods. PhD thesis, The University of Texas at Austin, May 2004. [36] U. Roshan, B. M. E. Moret, T. L. Williams, and T. Warnow. Performance of supertree methods on various dataset decompositions. In O. R. P. Bininda-Emonds, editor,Phylogenetic Supertrees: Combin-ing Information to Reveal the Tree of Life, volume 3 ofComputational Biology, pages 301–328. Kluwer Academics, 2004. (Dress, A. series ed.). [37] U. Roshan, B.M.E. Moret, T.L. Williams, and T. Warnow. Rec-I-DCM3: A fast algorithmic tech-nique for reconstructing large phylogenetic trees. InProc. of the 3rd IEEE Bioinformatics Conference (CSB2004), pages 98–109, Stanford University, Palo Alto, CA, August 2004.

[38] N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstruction of phylogenetic trees.Molecular Biological and Evolution, 4:406–425, 1987. [39] D. Sankoﬀ and M. Blanchette. Multiple genome rearrangement and breakpoint phylogeny.Journal of Computational Biology, 5:555–570, 1998. [40] M. Snir and D. A. Bader. A framework for measuring supercomputer productivity.Int’l J. High Performance Computing Applications, 18(4):417–432, 2004.

[41] A. Stamatakis. An eﬃcient program for phylogenetic inference using simulated annealing. InProc. Int’l Parallel and Distributed Processing Symp. (IPDPS 2005), Denver, CO, April 2005. [42] A. Stamatakis, T. Ludwig, and H. Meier. Computing large phylogenies with statistical methods: Prob-lems and solutions. InProc. of 4th Int’l Conf. on Bioinformatics and Genome Regulation and Structure (BGRS2004), pages 229–233, Novosibirsk, Russia, July 2004. [43] A. Stamatakis, T. Ludwig, and H. Meier. Parallel inference of a 10.000-taxon phylogeny with maxi-mum likelihood. InProc. of 10th Int’l Euro-Par Conf. (Euro-Par 2004), pages 997–1004, Pisa, Italy, September 2004. [44] A. Stamatakis, T. Ludwig, and H. Meier. RAxML-III: A fast program for maximum likelihood-based inference of large phylogenetic trees.Bioinformatics, 21(4):456–463, 2005. [45] A. Stamatakis, T. Ludwig, H. Meier, and M.J. Wolf. Accelerating parallel maximum likelihood-based phylogenetic tree calculations using subtree equality vectors. InProc. of 15th IEEE/ACM Supercom-puting Conf. (SC2002), Baltimore, MD, November 2002. [46] A. Stamatakis, T. Ludwig, H. Meier, and M.J. Wolf. AxML: A fast program for sequential and par-allel phylogenetic tree computations based on the maximum likelihood method. InProc. of 1st IEEE Bioinformatics Conference (CSB2002), pages 21–28, Stanford University, Palo Alto, CA, August 2002.

[47] A. Stamatakis, M. Ott, and T. Ludwig. RAxML-OMP: An eﬃcient program for phylogenetic inference on SMPs. InProc. of 8th Int’l Conf. on Parallel Computing Technologies (PaCT2005), Krasnojarsk, Russia, September 2005.

[48] M. F. Su, I. El-Kady, D. A. Bader, and S.-Y. Lin. A novel FDTD application featuring OpenMP-MPI hybrid parallelization. InProc. Int’l Conf. on Parallel Processing (ICPP), pages 373–379, Montreal, Canada, August 2004.

[49] Y. Sun, D.A. Bader, X. Lin, and Y. Ling. Broadcast on clusters of SMPs with optimal concurrency. In Proc. Int’l Conf. Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV, June 2002.

[50] Y. Sun, X. Lin, Y. Pan, R.W.H. Lau, D.A. Bader, and P.Y.S. Cheung. Generalized block shift network for clusters.IEEE Trans. Circuits and Systems I, 49(4):543–546, 2002.

[51] M. Yan.High Performance Algorithms for Phylogeny Reconstruction with Maximum Parsimony. PhD thesis, Electrical and Computer Engineering Department, University of New Mexico, Albuquerque, NM, January 2004.