La lecture à portée de main
Découvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDécouvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDescription
Sujets
Informations
Publié par | profil-zyak-2012 |
Nombre de lectures | 22 |
Langue | English |
Extrait
Presented at RECOMB 2000, extended version appeared in Journal of Computational Biology
Universal DNA Tag Systems: A Combinatorial Design Scheme
∗ † ‡ §Amir Ben-Dor Richard Karp Benno Schwikowski Zohar Yakhini
Abstract ical synthesis. In solution, oligonucleotides tend to
specificallyhybridize with their Watson-Crick com-
Custom-designed DNA arrays offer the possibil- plements ([21]), and form a stable DNA duplex.
ity of simultaneously monitoring thousands of hy- This specificity is exploited in molecular hybridiza-
bridization reactions. These arrays show great tion assays, in which oligonucleotides are used as
potentialfor many medicaland scientific appilca- probes to identify any complementary (or near-
tions such as polymorphism analysis and genotyp- complementary) DNA from a complex mixture of
ing. Relatively high costs are associated with the target DNA.
need to specifically design and synthesize problem- Array-based hybridization assays, introduced in
specific arrays. Recently, an alternative approach the late 1980s [6, 13, 15, 3, 5, 8], offer the pos-
was suggested that utilizes fixed, universal ar- sibility of simultaneously monitoring a multitude
rays. This approach presents an interesting de- with(currently up to tens of thousands) of hy-
sign problem—the arrays should contain as many bridization reactions. In such an assay, a target-
probes as possible, while minimizing experimen- specific set of oligonucleotides is synthesized on a
talerrors caused by cross-hybridization. We use solid support surface (e.g., silicon or glass). A fluo-
a simple thermodynamic model to cast this de- rescently labeled target sample mixture of DNA or
sign problem in a formal mathematical framework. RNA fragments is then brought in contact with the
Employing new combinatorial ideas, we derive an treated surface, and allowed to hybridize with the
efficient construction for the design problem, and synthesized oligonucleotides. Scanning the fluores-
prove that our construction is near-optimal. cent labels of the fragments attached to the array
reveals information about the content of the sam-
ple mixture. Theoretically, the assay conditions1 Introduction
are such that hybridization only occurs in sites on
Oligonucleotides are short single-stranded pieces of the surface that are Watson-Crick complements to
DNA (typically 15-50 nucleotides) made by chem- some substring in the target. In practice, cross-
∗ hybridizationis a main source of cross-signalcon-Department of Computer Science & Engineering, Univer-
sity of Washington. Supported by the Program in Mathematics tamination in any array-based hybridization assay.
and Molecular Biology (amirbd@cs.washington.edu). Array-based hybridization assays show great†InternationalComputer Science Institute and Mathemat-
icalSciences Research Institute, University of Caifol rnia at potentialfor many different appiclations such as
Berkeley. Supported by NSF grant at the University of Wash- SNP genotyping [12], gene expression profiling
ington (karp@icsi.berkeley.edu).
‡ [4], and resequencing DNA [14, 12]. Recently,Department of Computer Science & Engineering, Univer-
sity of Washington. Supported by the German Academic Ex- S. Brenner and others [1, 2] suggested an alterna-
change Service (DAAD) (benno@cs.washington.edu).
§ tive approach based on universal arrays contain-Chemicaland BiologicalSystems Department, Agilent Lab-
oratories, a Hewlett-Packard Subsidiary (zohary@hpl.hp.com). ing oligonucleotides called antitags.TheWatson-
Crick complement of each antitag is called a tag.
The tag–antitag pairs are designed so that each
tag hybridizes strongly to its complementary an-
titag, but not to any other antitag. In this ap-
proach, the analysis of a DNA sample consists oftwo steps: solution-phase hybridization followed by 2. When an individualis to be genotyped, a
solid-phase hybridization. In the first step, hy- sample is prepared that contains the se-
bridization takes place between the target DNA quences flanking each of the SNP loci.
in solution and a set of oligonucleotide precursors The sample is mixed with the reporter
called reporter molecules. Each reporter molecule molecules. Solution-phase hybridization then
consists of a target-specific part ligated to a unique takes place. Assuming that specificity is per-
tag. Reporter/target hybridization events are reg- fect, this results in the flanking sequences of
istered (e.g by an enzymatic reaction). In the sec- the SNPs paired only with the appropriate
ond step the modified precursors are introduced reporter molecule.
to the array. Tags form duplexes with the corre-
3. Single nucleotides, A,C,T,G, fluorescently la-sponding antitags. Thus, the reporter molecules
beled with four distinct colors, are added toare sorted into different locations on the array and
the mixture. These labeled nucleotides hy-hybridization events can be determined. This ap-
bridize to the polymorphic site of each SNPproach has severaladvantages:
and are ligated to the corresponding reporter
• Complicated array manufacturing processes molecule. That is, each reporter molecule is
are required only for the fixed, universal com- extended by exactly one labeled nucleotide.
ponent of the assay. These universalcompo-
4. The extended reporter molecules are sepa-nents can therefore be mass-produced, signif-
rated from the sample fragments, and broughticantly reducing manufacturing costs.
intocontact with theuniversalarray. Assum-
• The assay components that need to be de- ing that specificity is perfect, the tag part of
signed for a specific target are involved in so- each reporter molecule will only hybridize to
lution phase processes. The underlying nu- its complementary antitag on the array. Thus
cleic acid chemistry and thermodynamics are the extended reporter molecules sort into the
better understood than the same aspects of array sites where the corresponding antitag is
surface-based processes. Therefore a more ef- present.
ficient and effective design process is facili-
5. For each site of the array, the fluorescent col-tated.
ors present at that site are detected. The col-
As an example, we describe a multiplexed SNP ors indicate which bases were used for the ex-
genotyping assay. SNPs (single nucleotide poly- tension at the corresponding SNP site, and
morphisms) are differences, across the population, thus revealthe SNPvariationspresent inthe
in a single base, within an otherwise conserved ge- individual.
nomic sequence [9]. Genotyping is a process that
The design problem for a DNA TAT systemdetermines the variants present in a given sample,
presents a tradeoff. Clearly, it is desirable to haveover a set of SNPs. This assay uses off-the-shelf
as many tags as possible, in order to maximize theuniversalcomponents: a universalset of oilgonu-
number of SNPs that can be genotyped in parallel.cleotide tags and a universal array of antitags. The
On the other hand, if too many tags are used, sim-antitags, immobilized on the array, are Watson-
ilar tags will necessarily entail cross-hybridizationCrick complements of the tags in the mixture. The
events (where tags hybridize to foreign antitags),whole system will be called a DNA Tag/AntiTag
reducing the accuracy of the assay.system and in short a DNA TAT system. Con-
This design problem was identified in previoussider a set of SNPs to be genotyped. The assay is
work and several formulations and solutions wereperformed as follows (see Fig. 1):
proposed [10, 1, 2, 18, 11]. These papers differ
1. A set of reporter molecules (one for each both in the way hybridization is modeled, and in
SNP) is synthesized in solution. Each re- the algorithmic approach employed to find a good
porter molecule consists of two parts that are DNA TAT system. In [10] a TAT system is de-
ligated (in string language: concatenated) to- scribed as a part of a strategy for surface-based
gether. The first part is the Watson-Crick DNA computing. The authors take a coding theory
complement of the upstream sequence that approach and choose to modelcross-hybridization
immediately precedes the polymorphic site of constraints as generalHamming distance condi-
the SNP. The second part of each reporter tions. A set of 108 8-mers, with a 50%G/C content,
molecule is a unique tag from the universal which differ in at least 4 bases from each other, is
set of tags.Fragments spanning the
polymorphism sites for all
the SNPs in the set are
extracted. The different
shapes denote different
variants.
Oligonucleotides complementary to the sequences
immediately preceding the polymorphism sites are
tagged by DNA tags, designed to specifically
hybridize to their complements on the array.
Extension reactions take place in solution phase,
in the presence of a mixture of all four dydeoxy
nucleotides (differentially fluorescently labeled)
and an appropriate enzyme. For each SNP the
extending base is the one complementary to the
one corresponding to the base present in the
sample sequence. After separation (the whole
process can be performed in high temperature) a
mixture of reporter molecules is formed. This
mixt