Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
9 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
9 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets. Results We describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n , where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets. Conclusion Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses.

Informations

Publié par
Publié le 01 janvier 2010
Nombre de lectures 62
Langue English

Extrait

BMC Bioinformatics
BioMedCentral
Open Access Research Efficient genomescale phylogenetic analysis under the duplication loss and deep coalescence cost models 1,3 23 Mukul S Bansal, J Gordon Burleighand Oliver Eulenstein*
1 2 Addresses: Schoolof Computer Science, Tel Aviv University, Tel Aviv 69978, Israel,Department of Biology, University of Florida, 3 Gainesville, FL 32611, USA andDepartment of Computer Science, Iowa State University, Ames, IA 50011, USA Email: Mukul S Bansal  bansal@tau.ac.il; J Gordon Burleigh  gburleigh@ufl.edu; Oliver Eulenstein*  oeulenst@cs.iastate.edu *Corresponding author
fromThe Eighth Asia Pacific Bioinformatics Conference (APBC 2010) Bangalore, India 1821 January 2010
Published: 18 January 2010 BMC Bioinformatics2010,11(Suppl 1):S42
doi: 10.1186/1471210511S1S42
This article is available from: http://www.biomedcentral.com/14712105/11/S1/S42 ©2010 Bansal et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract Background:Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplicationloss or deep coalescence reconciliation cost are prohibitively slow for large datasets. Results:We describe novel algorithms for SPR and TBR based local search heuristics under the duplicationloss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor ofn, wheren is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplicationloss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three largescale genomic data sets. Conclusion:Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplicationloss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genomescale phylogenetic analyses.
Page 1 of 9 (page number not for citation purposes)
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents