Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios
20 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
20 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

A long recognized problem is the inference of the supertree S that amalgamates a given set { G j } of trees G j , with leaves in each G j being assigned homologous elements. We ground on an approach to find the tree S by minimizing the total cost of mappings α j of individual gene trees G j into S . Traditionally, this cost is defined basically as a sum of duplications and gaps in each α j . The classical problem is to minimize the total cost, where S runs over the set of all trees that contain an exhaustive non-redundant set of species from all input G j . Results We suggest a reformulation of the classical NP -hard problem of building a supertree in terms of the global minimization of the same cost functional but only over species trees S that consist of clades belonging to a fixed set P (e.g., an exhaustive set of clades in all G j ). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data. We define an extensive set of elementary evolutionary events and suggest an original definition of mapping β of tree G into tree S . We introduce the cost functional c ( G , S , f ) and define the mapping β as the global minimum of this functional with respect to the variable f , in which sense it is a generalization of classical mapping α . We suggest a reformulation of the classical NP -hard mapping (reconciliation) problem by introducing time slices into the species tree S and present a cubic time solving algorithm to compute the mapping β . We introduce two novel definitions of the evolutionary scenario based on mapping β or a random process of gene evolution along a species tree. Conclusions Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mapping β , the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of tree S . The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the .

Sujets

Informations

Publié par
Publié le 01 janvier 2012
Nombre de lectures 14
Langue English
Poids de l'ouvrage 1 Mo

Extrait

Lyubetskyet al. Biology Direct2012,7:48 http://www.biologydirect.com/content/7/1/48
R E S E A R C H
Open Access
Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios 1* 1 1,2 1 Vassily A Lyubetsky , Lev I Rubanov , Leonid Y Rusin and Konstantin Yu Gorbunov
Abstract Background:A long recognized problem is the inference of the supertreeSthat amalgamates a given set {Gj} of treesGj, with leaves in eachGjbeing assigned homologous elements. We ground on an approach to find the treeSby minimizing the total cost of mappingsαjof individual gene trees GjintoS. Traditionally, this cost is defined basically as a sum of duplications and gaps in eachαj. The classical problem is to minimize the total cost, whereSruns over the set of all trees that contain an exhaustive nonredundant set of species from all inputGj. Results:We suggest a reformulation of the classicalNPhard problem of building a supertree in terms of the global minimization of the same cost functional but only over species treesSthat consist of clades belonging to a fixed setP(e.g., an exhaustive set of clades in allGj). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data. We define an extensive set of elementary evolutionary events and suggest an original definition of mappingβof treeGinto treeS. We introduce the cost functionalc(G,S,f) and define the mappingβas the global minimum of this functional with respect to the variablef, in which sense it is a generalization of classical mappingα. We suggest a reformulation of the classicalNPhard mapping (reconciliation) problem by introducing time slices into the species treeSand present a cubic time solving algorithm to compute the mappingβ. We introduce two novel definitions of the evolutionary scenario based on mappingβor a random process of gene evolution along a species tree. Conclusions:Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mappingβ, the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of treeS. The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the tools is tested with artificial and prokaryotic tree data. Reviewers:This article was reviewed by Prof. Anthony Almudevar, Prof. Alexander Bolshoy (nominated by Prof. Peter Olofsson), and Prof. Marek Kimmel. Keywords:Phylogenetics, Fast algorithms, Tree inference, Species tree, Tree amalgamation, Tree reconciliation, Supertree, Evolutionary events, Gene duplication, Gene loss, Horizontal gene transfer, Gene gain, Time slices
* Correspondence: lyubetsk@iitp.ru 1 Institute for Information Transmission Problems, The Russian Academy of Sciences (Kharkevich Institute), Bolshoy Karetny per. 19, Moscow 127994, Russia Full list of author information is available at the end of the article
© 2012 Lyubetsky et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Lyubetskyet al. Biology Direct2012,7:48 http://www.biologydirect.com/content/7/1/48
Background Problems in supertree inference DenoteSa tree of species or other taxonomic units, pro teins, etc. The long recognized problem is to infer a treeS that amalgamates a given set {Gj} of treesGj, with leaves in eachGjbeing assigned homologous sequences from an jth family of homologous elements. Only leaf names, not sequences themselves, are considered. Henceforth, assume that leaves inSare labeled with species namesx, leaves in eachGjwith speciesgene namesxy(geneyexists in speciesx); paralogs are allowed. Refer toSas aspecies tree, and to eachGjas agene tree. We elaborate a traditional approach from [1,2] to find the treeSsuch that it minimizes the total cost of map pings of individual gene treesGjintoS. Traditionally, somecost c(G,S) of mapping of a gene treeGinto a species treeSis defined. Choosing a par ticular definition ofc(G,S) (ref. e.g. to [2,3] and see Results) is not essential in terms of solving the classical problem below. For agivenset {Gj} of gene trees the total costis defined as X      c Gj;S¼c Gj;Sor j X      j;S¼:c Gj;Sð1Þ c G kj j
wherekjare certain weights. Theclassical problemis to find suchSthat globally minimizes the functionalc({Gj}, S), whereSruns over the set of all species trees that contain an exhaustive nonredundant set of species from all inputGj. SuchSis called asupertreefor the given set {Gj}. DenoteV0a set of all species names occurring in leaves of the input treesGj. Thus, the classical problem is to find the global minimum of cost functional (1) over all species treesSthat possess the setV0of leaf names. The supertree building problem isNPhard, i.e., any algorithm to solve it must possess an exponential complex ity (ifNPP). Numerous heuristics exist (e.g. in [46]), which however do not provide evaluations of the runtime of corresponding algorithms. UnlessNP=P, none of them can possess a polynomial complexity and be proved to find the true global minimum. We propose a reformulation of the classical problem and develop an effective deterministic algorithm that meets many biological prerequisites (Description of the first algorithm and Results). Namely, the supertreeSis sought for as a global minimum of (1) butSruns over a set of such species trees that mostly contain clades present in input treesGj, [3,7,8]. A set of species names assigned to leaves of a subtree inGjwith the rootvis called aclade(of vertexvinGj) and denoted bycl(v). The setPincludes all clades from treesGjand
Page 2 of 20
additionally the set of species namesV0. SuchPis re ferred to as astandardset. Its cardinality has the order ofnm, wherenis the number of gene trees, andmis the total number of species. For the standard setP, the algo rithms running time is cubic and determined by formula (2) below. Further, suppose thatcl(v1),cl(v2) are the clades of two noncomparable verticesv1,v2in a treeGj, and the inter sectionI(v1,v2) of these clades is not empty. Optionally, the setscl(vi)I(v1,v2), (i=1, 2) are also included inP; and for each vertexvthat is ancestral tovi(i=1 ori=2) but not to another vertex from the pairv1,v2, the setcl (v)cl(vi) is included inP. In so doing,horizontal gene transfers are accounted forin a species tree, ref. to [9]. IfPincludes any other nonempty subsets ofV0and its cardinality is arbitrary, the algorithm remains cubic in time but with respect to cardinality |P| of setP, ref. to formula (3) below. Therefore, thenonstandard problemformulated in this study consists of finding the global minimum of functional (1) among species treesSthat have the set of leavesV0and a set of clades belonging to a fixed setP. Thus,Pis a parameter of the problem and of the solving algorithm. The algorithm operates equally with anyP. The solution is also referred to as asupertreeor alim ited supertreewith respect toP. A mapping ofGjintoS, as well as defyning any sce nario, requires a predefined fixed set of elementaryevo lutionary events. Thestandardset (of events) contains only gene duplications and losses. Theextendedset (of events) additionally contains horizontal gene transfers, gains, etc. The list of elementary evolutionary events and their definitions are given in Description of the first al gorithm. Henceforth, edges in a species tree are referred to astubesto contrast the difference with the edges in gene trees. With the standard event set, the algorithm possesses the running time of   3 O n mðnþmÞ ð2Þ
For simplicity, here we assume that the average num ber of leaves in given treesGjis multiple ofm. If setPhas an arbitrary cardinality, the mentioned time has the order of   3 2 3 OjPjj þ Pjnmþ jPjmð3Þ
Let a set {Gj} of gene trees and its associatedP be fixed. A setVPis defined asbasicif it is either a single ton set or can be split into two basic sets. Let us intro duce the condition
V0is a basic set
(*)
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents