La lecture en ligne est gratuite
Lire Télécharger

FromSubtreestoSupertrees

SebastianBo¨cker

Universität Bielefeld
SGtrrauktduuierrbildutenkngollegs-
prozesse

Dissertation
zurErlangungdesDoktorgrades
derFakulta¨tfu¨rMathematik
Universita¨tBielefeld

Bielefeld,imAugust1999

ii

.1

.2

.3

Gutachter:

Gutachter:

Gutachter:

.forP

.forP

forP.

Andreas

.M.W

sserD

MichaelA.Steel(Christchurch,

ThomasZink

Datumdermu¨ndlichenPru¨fung:28.Januar2000

Neuseeland)

Gedrucktaufalterungsbesta¨ndigemPapier

1ISO9706.

Acknowledgments

ThisthesiswaswrittenunderthesupervisionofProf.AndreasW.M.Dress.
Partsofthisthesishavebeenpublishedinadvance[BD00b,BDS99],andfurther
papersareforthcoming[BBDS,BD,Bo¨ca,Bo¨cb].
Theworkwascarriedoutwhilebeingsupportedbythe“GraduiertenkollegStruk-
turbildungsprozesse”thatispartoftheCenterforInterdisciplinaryResearchonStruc-
tureFormation(FSPM)attheUniversityofBielefeld,Germany.
Firstandforemost,IwishtothankmysupervisorProfessorAndreasDress.
Withouthisideas,comments,andcontributionsthisthesiswouldnotexist.Ialso
thankhimforbringingmeintocontactwiththetopic,forfruitfuldiscussions,and
formaintainingmyenthusiasm.Hispromotionmadeitpossibleformetogoto
Christchurch,NewZealand,forthreemonths.
SpecialthanksgotoDr.MikeSteelwhoaskedtherightquestionattheright
timeandtherebygavemethechancetowritethisthesis.Heisnotonlymy“unocial
supervisor”butalsomyhikingtrainer,andhecoulddoubtlesslymakeathirdcareer
asacartographer.ThankyouverymuchforthehospitalityinChristchurch,andfor
themaps!
Thanks(withoutanyspecialorder)goto:MygirlfriendClaudiaHu¨ttnerfor
listeningandnotcomplaining;AnkeSchnitkerforthechats;myroommatesat
university,ChristianAlscherandSwenOsterkamp,forkeepingmesanemost
ofthetime;everybodyelseintheGraduiertenkolleg;myprimaryschoolteacher
FriedelKaraschewskifortellingmetostudymathematics;KordulaDo¨ringfor
theneedles;ThomasMu¨ller,NikolausWiesner,andthecityofHamburgfor
brainrelaxation;CakeforthreegreatCDs;ProfessorPeterSlodowyforshowing
methefunandbeautyofmathematics;theChristchurchiansforawonderful
time;BernhardBalkenhol,HolgerPaschke,andKarl-HermannWieners
forcomputersupport;MarcKupietzforcigarettes,movies,andmusic;myParents
foreverything.
Inaccordancewithmathematicsprotocol,Iwillendeavortousethepersonal
pronoun“we”toindicatethereaderandthewriter,ratherthanthe“royalwe.”

iii

i

v

ACKNOWLEDGM

STNE

Contents

Chapter1.Introduction
1.1.Theproblemoftreeamalgamation
1.2.Motivationviaapplications
1.3.Themainresult
1.4.Structureofthisthesis
1.5.Furthersteps
1.6.Basicnotations
Chapter2.Graphs,Trees,andPatches
2.1.Graphsandpaths
2.2.Trees
2.3.Patches
2.4.Twoequivalencerelationsonthesetofvertices
Chapter3.X-TreesandSplitSystems
3.1.Denitions
3.2.Basicobservations
3.3.InducedY-treesandsplitsystems
3.4.Inducingsubtreesviapatchesandsubsetsofedges
3.5.ConnectingpartialsplitsystemsandX-trees
Chapter4.TheTreeJoiningTheorem
4.1.Proofoftheexistence
4.2.Proofoftheuniqueness
Chapter5.ClusterSystemsandPatchworks
5.1.Denitions,notations,andbasicobservations
5.2.Mainresults
5.3.Examples
5.4.ProofofTheorem4
5.5.ProofofTheorems5and2incase#C<1
5.6.Applicationsofpatchworks
5.7.Droppingthenitenessrestriction
Chapter6.QuartetEncodingsandExcess
v

11223457781141717181023282131323536304144454640575

vi

viCONTENTS
6.1.Denitionsandbasicobservations
6.2.Examplesofquartetencodings
6.3.Tightquartetencodings
6.4.Inducedencodingsandq-patches
Chapter7.TheMainTheorem
7.1.Condition(ii)impliescondition(i)
7.2.Condition(i)impliescondition(ii)
7.3.Firstclaim
7.4.Secondclaim
7.5.Thirdclaim
Chapter8.ImplicationsandApplications
8.1.Reconstructingtreesfromquartetsplits
8.2.Reconstructingtreesfromsubtrees
8.3.Fastamalgamationofanexcess-freecollectionoftrees
8.4.Thedyadicclosure
Bibliography
ListofSymbols
xednI

758506265656669617277777385888195999

21.INTRODUCTION
Infact,eventheproblemofdeterminingwhetherthereexistsatleastonesupertreeof
TwasproventobeNP-completebySteel[Ste92];anditremainsNP-completeifwe
assume—withoutlossofgenerality,aswewillseeinSection3.5—thatalltreesinT
havepreciselyfourleaves.Thecomplexityofdeterminingwhethersome(known!)tree
istheuniquesupertreeofagivencollectionisstillunknown,butstronglyconjectured
tobeNP-completeaswell.AndifthetreesinTcanbeamalgamated,theremaybe
(exponentially)manysupertrees;seeExample19onpage81fora“non-trivial”case
whereallofthese(exponentiallymany)supertreesarebinary.

1.2.Motivationviaapplications
Invariouseldsofclassicationsuchasbiology[CSW87,SPH98,SOWH96],lin-
guistics[War97],orstemmatology[BBHR98,Rob96],treesareusedtorepresentevolu-
tionary,historical,orhierarchicalrelationships.Inbiology,suchtrees(“phylogenies”)
typicallyrepresenttheevolutionaryhistoryofacollectionofextantspecies,orthe
lineofdescentofsomegene.Inmostapplications,theobjectsofinterestarefound
atthetips(leaves)ofthetree,whileallinteriorverticescorrespondtoabranching
(orspeciation)event.
Thereareseveralreasonswhyamalgamationproblemsasdescribedabovearise
naturallyintheseelds.Firstly,wemaywishtocombinetreesthathavebeen
reconstructedusingdistinct,thoughoverlappingcollectionsofobjects(usuallyby
dierentresearchersusingdierentdataand,asoftenasnot,dierentreconstruc-
tionmethods).Asecondreasonisthat,ingeneral,itisdiculttoaccuratelyre-
constructlargetreesdirectly,andwemaychooseinsteadtoreconstruct“trustwor-
thy”treesforsmallsubsets,andthencombinetheseinasupertree(orsupertrees),
see[BD86,HNP+98,HVW99,HW99,SvH96,Wil98].Anadditionalreasonforthisap-
proachisthat,forgeneticdata,thenumberofsitesthatcanbeaccuratelyaligned
acrossasmallnumberofcloselyrelatedsequencesisoftenmuchlargerthanthecorre-
spondingnumberofsitesforasetthatislargeandincludesratherdiversesequences.
Also,theaccuracyofpolynomialtimetreereconstructionmethods(likeneighborjoin-
ingorheuristicsformaximumparsimony)appearstodeterioratewiththemaximum
evolutionarydistanceinaphylogeny[RW97,HNR+98].

1.3.Themainresult
Althoughtheproblemsof(a)ndingasupertreeand(b)testingwhetheritis
uniquelydeterminedareknowntobehard,afullanswercanbegiventothesec-
ondquestion(implyingaswellanecientwaytoeventuallyconstructthisunique
supertree)inthefollowing,particularlyinterestingcase:Givenacollectionofbinary
(ornon-degenerated)treesT={T1,T2,...,Tk},letXjdenotethesetofleaves(or
pendingvertices)ofthetreeTjforj=1,...,k,letXdenotethesetofleavesofa
binarytreeT,andassumethatXcoincideswiththeunionofthe(notnecessarily
disjoint)setsX1,...,Xk.IfTistheuniquesupertreeofthecollectionT,then
k#X3(#Xj3)
X1=j

1.4.STRUCTUREOFTHISTHESIS3
mustalwayshold,suggestingthatparticularattentionshouldbedevotedtothemini-
mum1case—thatis,thecasewhereequalityholds.Tothisend,wedenetheexcess
ofthecollectionTby
exc(T):=#Xj(#Xj3)3,
[kXk
j=1j=1
andwede