A Bayesian approach for constructing genetic maps when markers are miscoded
17 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

A Bayesian approach for constructing genetic maps when markers are miscoded

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
17 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

The advent of molecular markers has created opportunities for a better understanding of quantitative inheritance and for developing novel strategies for genetic improvement of agricultural species, using information on quantitative trait loci (QTL). A QTL analysis relies on accurate genetic marker maps. At present, most statistical methods used for map construction ignore the fact that molecular data may be read with error. Often, however, there is ambiguity about some marker genotypes. A Bayesian MCMC approach for inferences about a genetic marker map when random miscoding of genotypes occurs is presented, and simulated and real data sets are analyzed. The results suggest that unless there is strong reason to believe that genotypes are ascertained without error, the proposed approach provides more reliable inference on the genetic map.

Sujets

Informations

Publié par
Publié le 01 janvier 2002
Nombre de lectures 2
Langue English
Poids de l'ouvrage 1 Mo

Extrait

Genet. Sel. Evol. 34 (2002) 353 369 353
? INRA, EDP Sciences, 2002
DOI: 10.1051/gse:2002012
Original article
A Bayesian approach for constructing
genetic maps when markers are miscoded
a bGuilherme J.M. ROSA , Brian S. YANDELL ,
cDaniel GIANOLA
a Department of Biostatistics, UNESP, Botucatu, SP, Brazil
b Departments of Statistics and of Horticulture,
University of Winconsin, Madison, WI, USA
c Departments of Animal Science and of Biostatistics & Medical Informatics,
University of Wisconsin, Madison, WI, USA
(Received 10 September 2001; accepted 8 February 2002)
Abstract The advent of molecular markers has created opportunities for a better understanding
of quantitative inheritance and for developing novel strategies for genetic improvement of
agricultural species, using information on quantitative trait loci (QTL). A QTL analysis relies
on accurate genetic marker maps. At present, most statistical methods used for map construction
ignore the fact that molecular data may be read with error. Often, however, there is ambiguity
about some marker genotypes. A Bayesian MCMC approach for inferences about a genetic
marker map when random miscoding of genotypes occurs is presented, and simulated and
real data sets are analyzed. The results suggest that unless there is strong reason to believe that
genotypes are ascertained without error, the proposed approach provides more reliable inference
on the genetic map.
genetic map construction / miscoded genotypes / Bayesian inference
1. INTRODUCTION
The advent of molecular markers has created opportunities for a better
understanding of quantitative inheritance and for developing novel strategies
for genetic improvement in agriculture. For example, the location and the
effects of quantitative trait loci (QTL) can be inferred by combining information
from marker genotypes and phenotypic scores of individuals in a population in
linkage disequilibrium, such as in experiments with line crosses, e.g., using
backcross or F2 progenies. A QTL analysis relies on the availability of
accurate estimates of the genetic marker map, which includes information
Correspondence and reprints
E-mail: rosag@msu.edu
Current address: Departments of Animal Science and of Fisheries & Wildlife,
Michigan State University, East Lansing, MI 48824, USA354 G.J.M. Rosa et al.
on the order and on genetic distances between marker loci order. Genetic
maps are inferred from recombination events between markers, which are
genotyped for each individual. Several statistical methods have been sug-
gested for map construction. Lathrop et al. [14], Ott [17] and Smith and
Stephens [21] discussed maximum likelihood procedures for marker map
inferences, and George et al. [9] presented a Bayesian approach for ordering
gene markers. Jones [10] reviewed a variety of statistical methods for gene
mapping. At present, most statistical methods used for map construction
ignore the possibility that molecular (marker) data may be read with error.
Often, however, there is ambiguity about genotypes and, if ignored, this can
adversely affect inferences [3,15]. The problem of miscoded genotypes has
received the attention of some investigators. Most of their research, however,
has focused on error detection and data cleaning [4,11,15]. The objective of our
work is to discuss possible biases in marker map estimates when miscoding
of genotypes is ignored and to suggest a robust approach for more realistic
inferences about marker positions and their distances. The approach simultan-
eously estimates the genotyping error rate and corrects for possible miscoded
genotypes, while making inferences on the order and distances between genetic
markers.
The plan of the paper is as follows. In Section 2, the problem of miscoding
genotypes is discussed, as well as the systematic bias that this imposes on
genetic map estimation. In Section 3, a Bayesian approach for inferences
about a genetic map, when miscoding is ignored, is reviewed. In Section 4, the
methodology is extended to handle situations with miscoded genotypes, when
these occur at random. Simulated and real data are analyzed in Sections 5
and 6, respectively, and the results are discussed. Concluding remarks are
presented in Section 7.
2. THE PROBLEM CAUSED BY MISCODED GENOTYPES
First consider the estimation of the genetic distance between two marker loci
having a recombination rate h. In simple situations, e.g., with double haploid
or backcross designs, each individual has one of two possible genotypes (say 0
or 1) at each marker locus. Inferences about genetic distance between loci are
based on recombination events, which are observed by genotyping individuals.
If marker genotypes could be read without error, the probability of observing
a recombination event in a randomly drawn individual would be h. However,
it will be supposed that there is ambiguity in the assignment of genotypes to
individuals. For example, a genotype 0 may be coded as 1 (or vice-versa),
with probability p. Here, given the genotype for a speci c marker and the
probability of miscoding.p/, the distribution of the observed genotypes can beGenetic map construction 355
Figure 1. Expected recombination events observed on different values of miscoding
probabilities (p), for some selected values of recombination rates (h).
written as:
jm gj 1 j m gjij ij ij ijpTmjg ;pUD p .1 p/ ,ij ij
where m and g are the observed and true genotypes (m , g D 0; 1), respect-ij ij ij ij
ively, for locus j (jD 1; 2) of individual i (iD 1; 2;:::; n).
If a recombination event between the loci is observed, this may be due to
either a true genetic recombination between them, or to an artifact caused by
miscoding. Hereinafter, a recombination observed by genotyping the mark-
ers will be denoted as the apparent recombination , to distinguish between
observed and true recombination events.
The probability of observing an apparent recombination between markers 1
and 2 for individual i can be written as:
Pr.sD 1/D PrTrD 1U.PrTno miscod.UC PrTdouble miscod.U/i i
C PrTrD 0U PrTone miscod.Ui

2 2D h p C.1 p/ C 2.1 h/p.1 p/
D hC 2p.1 p/.1 2h/, (1)
where sDjm m j and rDjg g j stand for apparent and real recom-i i1 i2 i i1 i2
k 1 kbination events, respectively; and PrTrD kUD h .1 h/ , with kD 0; 1.i
It is easy to realize, therefore, that recombination rates estimated from
recombinations observed by genotyping the marker loci, ignoring the possib-
ility of miscoding, would be biased upwards whenever the markers are linked
.h < 0:5/ and p > 0. Figure 1 shows the expected apparent recombination
rates as function of p, for some selected recombination rate values. It seems
that the smaller the genetic recombination rate, the worse the relative bias
produced by miscoded genotypes.356 G.J.M. Rosa et al.
Figure 2. Variance of recombination events observed on different values of miscoding
probabilities (p), for some selected values of recombination rates (h).
The variance of the apparent recombination event is equal to:
VarTsUD PrTsD 1U.1 PrTsD 1U/i i i
DThC 2p.1 p/.1 2h/UT1 h 2p.1 p/.1 2h/U
2 3 2D h.1 h/C 2p.1 3pC 4p 2p /.1 2h/ . (2)
Thus, the variance of apparent recombination events is larger than the variance
of the real recombination events whenever the markers are linked.h < 0:5/
and p> 0. Figure 2 shows the variance of the apparent recombination events
as a function of p, for some different values of recombination rates.
In view of the possibility of miscoding for each marker genotype (i.e. ambi-
guity about their genotypes), standard methods commonly used for genetic
map inferences overestimate the recombination rate between loci (or, in other
words, underestimate genetic linkage), and underestimate its precision [15].
For example, the maximum likelihood estimator of the recombination rate
between the loci (if the possibility of miscoding is ignored) is:
nX1OhD jm m j,i1 i2
n
iD1
with expectation and variance given by (1) and (2), respectively.
In more general situations, we have more than just two marker loci, and
the goal is to construct the genetic map, i.e., to order these marker loci and to
estimate the genetic distances between them. Again, all inferences are based on
recombination events observed (apparent recombinations) between the marker
loci. The problem of ignoring miscoding may lead to even worse dif culties,
e.g., to the mistaken ordering of the loci, specially with dense maps.Genetic map construction 357
3. BAYESIAN APPROACH FOR GENETIC MAP CONSTRUCTION
First, we will review a Bayesian approach for map construction when mis-
coding is not taken into account [9]. Consider the genotype of m markers for
the individual i as gD.g ; g ;:::; g /. In a backcross design, for example,i i1 i2 im
g D 0 if the individual i is homozygous for the locus j, and 1 otherwise. Theij
sampling model of g , assuming the Haldane map function, is given by:i
m 1Y
jg g jij i;jC1 1 j g g jij i;jC1p.gjl;h// h .1 h/ , (3)i jj
jD1
where l is the order of the genetic marker loci and h is the recombination ratej
between the loci j and jC1. Considering a sample of n independent individuals,
the likelihood of l and h is given by:
nY
L.l;hjG/D p.Gjl;h/D p.gjl;h/i
iD1
n m 1Y Y
jg g jij i;jC1 1 j g g jij i;jC1/ h .1 h/ , (4)jj
iD1 j

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents