Dynamics of genealogical trees for type- and state-dependent resampling models [Elektronische Ressource] / vorgelegt von Sven Piotrowiak

Dynamics of Genealogical Trees for Type- andState-dependent Resampling ModelsDer Naturwissenschaftlichen Fakulta¨tder Friedrich-Alexander-Universita¨tErlangen-Nu¨rnbergzurErlangung des Doktorgrades Dr. rer. nat.vorgelegt vonSven Piotrowiakaus Neustadt an der AischAls Dissertation genehmigtvon der Naturwissenschaftlichen Fakulta¨tder Friedrich-Alexander Universit¨at Erlangen-Nu¨rnbergTag der mu¨ndlichen Pru¨fung: 10. Januar 2011Vorsitzender der Promotionskommission: Prof. Dr. Rainer FinkErstberichterstatter: Prof. Dr. Andreas GrevenZweitberichterstatterin: Prof. Dr. Anita WinterDynamiken genealogischer Ba¨ume fu¨r typen- undzustandsabh¨angige Resampling-ModelleZusammenfassungWir untersuchen die Evolution der Genealogie einer unendlich großen Population mitverschiedenen Typen in einem neutralen Modell, wobei die Dynamik von den Typen derIndividuen oder sogar von der gegenw¨artigen Zusammensetzung der Population beein-flusst wird. Dazu betrachten wir typen- und zustandsabh¨angige Resampling-Modelleim Diffusionslimes, wobei genauer die folgenden beiden Szenarien im Mittelpunkt derUntersuchung stehen: Das Weighted Sampling-Modell erlaubt es, unendlich viele ver-schiede Typen zuzulassen, wobei die Resampling-Raten nur von den Typen der Indi-viduen abha¨ngen. Im Ohta-Kimura-Modell existieren nur zwei verschieden Typen, je-doch h¨angen die Resampling-Raten zus¨atzlich von den relativen Ha¨ufigkeiten der Typenab.
Publié le : samedi 1 janvier 2011
Lecture(s) : 27
Tags :
Source : D-NB.INFO/1009840509/34
Nombre de pages : 195
Voir plus Voir moins

Dynamics of Genealogical Trees for Type- and
State-dependent Resampling Models
Der Naturwissenschaftlichen Fakulta¨t
der Friedrich-Alexander-Universita¨t
Erlangen-Nu¨rnberg
zur
Erlangung des Doktorgrades Dr. rer. nat.
vorgelegt von
Sven Piotrowiak
aus Neustadt an der AischAls Dissertation genehmigt
von der Naturwissenschaftlichen Fakulta¨t
der Friedrich-Alexander Universit¨at Erlangen-Nu¨rnberg
Tag der mu¨ndlichen Pru¨fung: 10. Januar 2011
Vorsitzender der Promotionskommission: Prof. Dr. Rainer Fink
Erstberichterstatter: Prof. Dr. Andreas Greven
Zweitberichterstatterin: Prof. Dr. Anita WinterDynamiken genealogischer Ba¨ume fu¨r typen- und
zustandsabh¨angige Resampling-Modelle
Zusammenfassung
Wir untersuchen die Evolution der Genealogie einer unendlich großen Population mit
verschiedenen Typen in einem neutralen Modell, wobei die Dynamik von den Typen der
Individuen oder sogar von der gegenw¨artigen Zusammensetzung der Population beein-
flusst wird. Dazu betrachten wir typen- und zustandsabh¨angige Resampling-Modelle
im Diffusionslimes, wobei genauer die folgenden beiden Szenarien im Mittelpunkt der
Untersuchung stehen: Das Weighted Sampling-Modell erlaubt es, unendlich viele ver-
schiede Typen zuzulassen, wobei die Resampling-Raten nur von den Typen der Indi-
viduen abha¨ngen. Im Ohta-Kimura-Modell existieren nur zwei verschieden Typen, je-
doch h¨angen die Resampling-Raten zus¨atzlich von den relativen Ha¨ufigkeiten der Typen
ab.
Die Genealogie einer Population kodieren wir mit Hilfe von ultrametrischen R¨aumen,
welche eine baumartige Struktur aufweisen. Wir fu¨hren genauer das Konzept der mar-
kierten metrischen Maßr¨aume ein, um einen geeigneten Zustandsraum fu¨r typenabh¨an-
gige Dynamiken zu erhalten. Die gewu¨nschten Prozesse, der baumwertige Weighted
Sampling-Prozess bzw. der baumwertige Ohta-Kimura-Prozess, werden dann als ein-
deutige Lo¨sungen von Martingalproblemen charakterisiert. Außerdem zeigen wir, dass
man diese Prozesse tatsa¨chlich als Diffusionlimes der entsprechenden baumwertigen
Resampling-Modelle fu¨r endliche Populationen erhalten kann.
Daru¨ber hinaus untersuchen wir das Langzeitverhalten der eingefu¨hrten Prozesse,
welches im Wesentlichen dem Langzeitverhalten eines baumwertigen Fleming-Viot-Pro-
zesses entspricht, welches wiederum durch einen Kingman-Koaleszenten beschrieben
wird. Wir charakterisieren auch die Menge der invarianten Maße, was es uns erm¨oglicht,
die Genealogie einer unendlich alten Population zu konstruieren.
Ich m¨ochte meinem Doktorvater Prof. Andreas Greven danken, dass er mir die Mo¨glich-
keit gegeben hat, an diesem interessanten Thema zu arbeiten, und mich immer bestm¨og-
lich unterstu¨tzt hat, aber auch dafu¨r, dass er immer um die no¨tige Finanzierung besorgt
gewesen ist. Außerdem gilt mein besonderer Dank meiner Frau Yvonne, ohne die ich
das alles nicht geschafft h¨atte.
Sven PiotrowiakAbstract
We study the evolution of the genealogy of an infinitely large population with different
typesinaneutralmodel, wherethedynamicsareaffectedbythetypesoftheindividuals
or even additionally by the current composition of the population. More precisely, we
studytype-orevenstate-dependentresamplingmodelsinalargepopulationlimit,where
we focus on the following two scenarios: The Weighted Sampling model, a model with
infinitely many possible types, where the resampling rates depend only on the types of
the individuals, and the Ohta-Kimura model, a model with only two types, where the
resampling rates depend also on the type frequencies.
We code the genealogy by means of ultrametric spaces which exhibit tree-like struc-
tures. To be exact, we introduce the concept of marked metric measure spaces in order
to obtain a suitable state space allowing to consider also type-dependent dynamics. The
desiredprocesses,thetree-valued Weighted Sampling dynamics andthetree-valued Ohta-
Kimura dynamics, are then given as unique solutions of martingale problems. Moreover,
we show that these processes arise indeed as diffusion limit of the corresponding tree-
valued resampling models for finite populations.
As an application, we study the long-term behavior of our introduced processes which
turns out to be essentially the same as for tree-valued Fleming-Viot dynamics, given by
a Kingman coalescent tree. In addition, we characterize the set of invariant measures
allowing to describe also the genealogy of an infinitely old population.Contents
1. Introduction 3
1.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. The models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1. Classical resampling models . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2. Tree-valued resampling models . . . . . . . . . . . . . . . . . . . . 8
1.3. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2. Main results 13
2.1. State space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1. Preparations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2. Results and properties . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2. Martingale problems and properties . . . . . . . . . . . . . . . . . . . . . 21
2.2.1. Preparations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3. Particle approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4. Long-term behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5. Outline of the proof section . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3. State space 37
3.1. K-marked metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.1. Definition and Gromov-Hausdorff distance . . . . . . . . . . . . . . 37
3.1.2. Gromov-Hausdorff convergence and compact sets . . . . . . . . . . 51
3.1.3. Completeness and separability . . . . . . . . . . . . . . . . . . . . 60
3.2. K-marked metric measure spaces . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1. Definition and characterization . . . . . . . . . . . . . . . . . . . . 64
3.2.2. Gromov-Prohorov metric (Proof of Theorem 1) . . . . . . . . . . . 68
3.2.3. Completeness and separability (Proof of Theorem 2) . . . . . . . . 73
3.2.4. Characterization of compact sets . . . . . . . . . . . . . . . . . . . 79
3.2.5. Tightness of probability measures. . . . . . . . . . . . . . . . . . . 92
3.2.6. Alternative characterizations of topologies (Proof of Theorem 3) . 95
4. Finite populations 97
4.1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3. Martingale problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4. Convergence of generators . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
14.5. Compact containment condition . . . . . . . . . . . . . . . . . . . . . . . . 110
5. Infinite populations 119
5.1. Duality for the tree-valued Weighted Sampling dynamics . . . . . . . . . . 119
5.1.1. State space of the dual process . . . . . . . . . . . . . . . . . . . . 121
5.1.2. Characterization of the dual process . . . . . . . . . . . . . . . . . 124
5.1.3. Duality relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2. Martingale problems (Proofs of Theorems 4 and 5) . . . . . . . . . . . . . 139
5.3. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6. Long-term behavior 143
6.1. Measure-valued Weighted Sampling diffusion . . . . . . . . . . . . . . . . 143
6.2. Tree-valued dynamics (Proof of Theorem 6) . . . . . . . . . . . . . . . . . 164
A. Appendix 181
A.1. Prohorov metric and product spaces . . . . . . . . . . . . . . . . . . . . . 181
A.2. Generators and martingale problems . . . . . . . . . . . . . . . . . . . . . 184
A.3. Kingman measure tree as invariant distribution . . . . . . . . . . . . . . . 185
Bibliography 1871. Introduction
1.1. Background
In probability theory, random trees are a frequently studied object with a wide range
of applications in a number of quite different fields. For instance, in computer science
the random binary search tree is a central object of study (see, e.g., [Dev87, BDL08]),
and in statistical mechanics ultrametric structures (which are closely related to trees)
+arise in dealing with spin-glasses (see, e.g., [MPS 84, BK06]). Moreover and closer to
the present work, in population genetics trees appear in a very natural way describing
the genealogy of a population.
In this work, we study the evolution of the genealogy of infinite populations in neutral
type-andstate-dependentresamplingmodels. Themechanismofresampling canshortly
be described as follows: In a finite population, each pair of individuals is picked with
a specific rate (here depending on the types of the individuals or even on the whole
configuration of the population) and both individuals die whereas one leaves two new
born individuals. This goes back to Moran ([Mor58]), but similar models (with non-
overlapping generations) have been considered before by Fisher ([Fis22]) and Wright
([Wri31]). Under some conditions on the dependence of the resampling rates, for large
populations the behavior of these (and further exchangeable) models can be described
by an universal diffusion limit as a measure-valued diffusion (cf. [Daw93, EK93, DM95,
Eth01]). Forconstantresamplingratesonegetsthewell-knownFleming-Viotprocess(cf.
[FV78, FV79]). We code the genealogy of these infinite population models by trees and
introduce tree-valued resampling dynamics describing the genealogy evolving forward in
time.
In population genetics, trees frequently have arisen in both resampling and branching
models. Classical objects in branching models are Galton-Watson trees which have, e.g.,
already been studied in [Ott49] and [Har63], where a “classical” tree can typically be
thought of a finite set of vertices and edges. In a series of papers ([Ald91a, Ald91b,
Ald93]), Aldous has taken a step forward considering also “tree-like” limit objects: The
continuum random tree can be obtained as a scaling limit (N !1) of critical Galton-
WatsontreesthetotalpopulationsizeofwhichisconditionedtobeN. Itcanbecharac-
terizedasthetreeassociatedwithastandardBrownianexcursion,andmoreover,inclose
connection to Donsker’s invariance principle, Aldous showed that it is universal in the
sense that it is the limit of various sequences of random trees. His main idea was to view
1trees (via isometric embeddings) as subsets of` and to study weak convergence of ran-+
1domvariablesthattakevaluesinsetofcompactsubsetsof` equippedwiththetopology+
of the Hausdorff metric. The concept of L´evy trees (cf. [DLG02, DLG05]), generalizing
31. Introduction
the continuum random tree, can be seen as the continuous analog of Galton-Watson
trees allowing also more general branching mechanisms when the offspring distribution
has infinite variance. Here, a random tree is typically a random variable with values in
the set of compact rooted real trees endowed with the topology induced by the Gromov-
Hausdorff distance. RealtreesorR-treesareaclassoftree-likemetricspaceswhichhave
been studied in the context of the so-called T-Theory (see [Dre84, DMT96, Ter97]), and
theiruseinprobabilitytheorywasdevelopedin[EPW06]. IncontrasttoAldous’sspecific
1embedding into ` , the Gromov-Hausdorff distance tries to find the “best” embedding+
into a common metric space where the distance can then be measured as the distance
of subsets via the Hausdorff metric (cf. [Gro99, BBI01]). Further recent publications
usingthisconceptare[HMPW08,GPW09b]forexample. Atree-valuedstate-dependent
branching model dealing with trees in the same sense as it will be the case in this work
can be found in [Glo¨11].
Contrary to branching models, in resampling models the population size is always
assumed to be constant as the mechanism of resampling shows. In their study, trees
appear in the form of coalescent processes describing the genealogy of the population
backwards in time. Kingman introduced in [Kin82a, Kin82b] his famous coalescent
which turns out to be a model for the collision of ancestral lineages in the Fleming-
Viot model. More precisely, Kingman’s coalescent (backward dynamics) is dual to the
Fleming-Viot process (forward dynamics). More general coalescent dynamics are given
by the class of so-called Λ-coalescents introduced in [Pit99, Sag99] which are again dual
to generalized Fleming-Viot processes ([BG03, BG05, BG06]). Here, also more than
two ancestral lineages are allowed to coalesce, events called multiple collisions. Even
more generally, the class of Ξ-coalescents admits also simultaneous multiple collisions
describing genealogies of populations with possibly large family sizes ([MS01, Sch00,
+BBM 09]). An overview about coalescents can be found in [Ber09].
Whereas coalescents give a static picture of the genealogy for a fixed point in time,
we want to study the evolution of the genealogy forward in time by constructing it in
an explicit way. Donnelly and Kurtz ([DK96, DK98, DK99]) introduced the well-known
look-down process providing a countable construction of the Fleming-Viot process. It
containsalltheinformationavailableinthemodel,andthusimplicitlyalsothegenealogy.
However, processes taking “infinite” trees as values have been studied only recently, see
[Zam01, Zam02, Zam03, EPW06, EW06, EL07].
We will follow here the approach of Greven, Pfaffelhuber and Winter [GPW09a,
GPW11]. In [GPW09a], they introduced the concept of metric measure spaces for prob-
abilistic needs. (Focused on more geometric aspects, this topic was studied before also
1in [Gro99], Chapter 3 .) The general idea in this paper, going back to Aldous, is to2
think of a sequence of trees to converge to a limit tree if all randomly sampled finite
subtrees converge towards the corresponding finite subtrees of the limit tree. For this
purpose, trees are coded as (tree-like) metric spaces which are equipped with a proba-
bility measure, and the topology corresponding to the presented type of convergence is
called Gromov-weak topology. So, the main object of study in [GPW09a] is the space
of metric measure spaces which turns out to be a nice space to do probability theory
41.2. The models
on it. Namely, it is shown that it is a Polish space since there exists a suitable metric,
the Gromov-Prohorov metric, inducing the Gromov-weak topology. Moreover, a char-
acterization of compact sets in this topology is provided allowing also statements about
tightness of probability measures on this space. In [GPW11], this concept is applied
to the study of evolving genealogies in infinitely large populations, more precisely the
genealogy of the Fleming-Viot process. Here, the genealogy is coded by trees in the
sense of ultrametric measure spaces. The tree-valued Fleming-Viot dynamics is defined
as the solution of a well-posed martingale problem, and furthermore it can be obtained
as diffusion limit of the tree-valued Moran dynamics which describe the genealogy of the
Moran model.
1.2. The models
The aim of this work it to extend the procedure of [GPW09a, GPW11]: We want to
describe the genealogy of a population with different types in a neutral model where the
dynamics are affected by the types of the individuals or even additionally by the current
composition of the population. More precisely, we study type- or even state-dependent
resampling models (without selection, mutation or recombination) in a large population
limit. In this section, we present these models more explicitly.
1.2.1. Classical resampling models
We want to model the evolution of a population with a fixed number of individuals or
organisms that are divided into different types. Letf1;:::;Ng be the set of “locations”
for a population with N 2 N individuals, which means that at every time t0 each
location 2 f1;:::;Ng is taken by exactly one individual, the individual in alive at
timetdenotedby(;t). Often,wewilllaxlyspeakoftheindividual. Furthermore,each
individual (;t) exhibits a type y 2 K, where K is an arbitrary set for now. Note that
we do not consider spatial models involving mechanisms as migration here; the term
“location” serves only as an illustration of the resampling mechanism.
Classical resampling models describe the evolution of the type configuration of the
NN individuals alive, and so their state space is K . The dynamics of such processes
0are stated in the following way: Each (unordered) pair f;gf1;:::;Ng of locations
0is selected with a certain rate, and then both individuals living in and die and are
0replaced by new ones, again living in and. The type of the new born individuals (the
1same for both) is adopted by one of the dead individuals, each with probability . This2
procedureiscalled resampling. Onesaysthechosendeadindividualisan ancestorofthe
two new born individuals, or vice versa the new born individuals are descendants of the
dead one that has been chosen. The individual that has died and has not reproduced
we call a fossil.
Of course, there are several conceivable modifications or generalizations of this mech-
anism we do not consider in this work. For example, the two new born individuals could
1chooseindependentlytheirtypesfromthetwodeadindividuals, eachwithprobability .
2
51. Introduction
1But this would only mean that the rates get multiplied by a factor of , and in fact the2
modelisthesameasours. Moreover,onecouldthinkaboutmodelswhereindividualscan
have more than two descendants. For instance, the Cannings model ([Can74, Can75]) is
a discrete-time population model with non-overlapping generations where in each step
the number of descendants of the N individuals is given by an exchangeable array of
random variables. Note that the well-known Wright-Fisher model ([Fis22, Wri31]) falls
into the class of Cannings models.
It is well-known that resampling models can be constructed via graphical representa-
tions as in Figure 1.1: Whenever a resampling event occurs, an arrow is drawn between
the two involved locations where the arrow points towards the individual that does not
reproduce. Obviously, this picture determines the genealogy of the whole population. In
particular, the types of the individuals at any time t0 are given by the types at time
t=0.
Note that we study only neutral models where the probability to reproduce is always
1 for each type, i.e. there is no selection since every type has the same fitness. Likewise,
2
we do not consider effects of mutation or recombination here.
In order to complete the characterization of these type configuration-valued processes
it is left to specify the rates for the resampling events described above, shortly called
0resamplingrates. Ifweconsideraconstantresamplingrateforallpairsf;goflocations,
we get the well-known Moran model (cf. [Mor58]). It may be natural to consider a more
generalsettingwherethewillingnesstopairdiffersbetweenpairsofindividualsaccording
to their types. Therefore, we want to study in this work the case where the resampling
rates depend on the current types of the individuals or even additionally on the current
type distribution, a form of state-dependence.
For this purpose, we assume that there is a function
2:M (K)K ![0;1) (1.2.1)1
0 2which is symmetric in (y;y )2K such that the resampling rate for a pair of locations
0 N N N 0 Nf;g at time t is given by const(( ) P; (); ()), where () is the type oft t t t
Nthe individual (;t) and ( ) P is the empirical type distribution at timet. Mainly, wet
are interested in the following two scenarios:
Weighted Sampling model: The resampling rates depend only on the types of the
individuals. That means that
0 0(;y;y )=(y;y ) (1.2.2)
2for some symmetric function :K ![0;1).
Ohta-Kimura model: The resampling rates depend on the types of the individ-
uals and the current type distribution. Here, K consists of only two types and
the resampling rates are equal to some constant (as in the Moran model) if the
individuals have the same type; but when their types differ, the rate is given by
N N N N 0const( ) P(f ()g)( ) P(f ()g); (1.2.3) t t t t
6

Soyez le premier à déposer un commentaire !

17/1000 caractères maximum.