UNIVERSITA  DEGLI STUDI DI ROMA  TOR VERGATA
111 pages

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

UNIVERSITA' DEGLI STUDI DI ROMA 'TOR VERGATA'

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
111 pages
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
UNIVERSITA' DEGLI STUDI DI ROMA 'TOR VERGATA' FACOLTA' DI SCIENZE MATEMATICHE FISICHE E NATURALI Statistical Mechanics of Unzipping: Bayesian Inference of DNA Sequence Tesi di Dottorato di Ricerca in Fisica Candidata Valentina Baldazzi Relatore Prof. Luca Biferale Co-relatore Simona Cocco (ENS, Paris) Hugues Dreysse (ULP, Strasbourg) Coordinatore di dottorato Prof. Piergiorgio Picozza Anno Accademico 2004-2005

  • steps reconstruction algorithm

  • coordinatore di dottorato

  • specific sequences

  • putative therapeutic function

  • single molecule

  • stretching dna

  • dna sequencing

  • been shown

  • finite force


Sujets

Informations

Publié par
Nombre de lectures 44
Poids de l'ouvrage 1 Mo

Extrait

UNIVERSITA’ DEGLI STUDI DI ROMA ’TOR VERGATA’
FACOLTA’ DI SCIENZE MATEMATICHE FISICHE E NATURALI
Statistical Mechanics of Unzipping:
Bayesian Inference of DNA Sequence
Tesi di Dottorato di Ricerca in Fisica
Candidata
Valentina Baldazzi
Relatore
Prof. Luca Biferale
Coordinatore di dottorato
Co-relatore Prof. Piergiorgio Picozza
Simona Cocco (ENS, Paris)
Hugues Dreysse (ULP, Strasbourg)
Anno Accademico 2004-2005Contents
Introduction i
1 to DNA 4
1.1 Chemical Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Double helix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 DNA mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Single molecule experiments . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Stretching DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Strand Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 DNA denaturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 DNA replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 DNA sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.4 Single Molecule Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Theoretical models for DNA elasticity and unzipping 28
2.1 Modelling DNA elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.1 Free Jointed Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.2 The Kratky-Porod model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.3 Worm Like Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Models of DNA unzipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 Static model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.2 Dynamical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 MonteCarlo procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.1 Constant force MC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.2 velocity MC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Sequence reconstruction 47
3.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Sequence Inference: the ideal case . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 ConstructingP(xjS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.2 Normalisation check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.3 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Reconstruction program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
iTable of contents
3.3.2 Quality indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.3 Single unzipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.4 Repeated unzippings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.5 Finite temperature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Analytical study of inference performances 63
4.1 High force theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.1 A simple approximation: no stacking interaction . . . . . . . . . . . . . . 64
4.1.2 The case of stacking interactions . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Finite force theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 Numerical check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.1 High force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.2 Finite force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Towards ’real’ data 80
5.1 Spatial and temporal resolution limits . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Multi-steps inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Numerical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.1 Finite temporal resolution generator . . . . . . . . . . . . . . . . . . . . . 84
5.3.2 Multi-steps reconstruction algorithm . . . . . . . . . . . . . . . . . . . . . 84
5.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.1 Preliminary study: jump probability distribution . . . . . . . . . . . . . . . 89
5.4.2 Single unzipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4.3 Repeated unzippings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
List of Figures 99
List of Tables 101
Bibliography 103
iiIntroduction
DNA molecules are the support for the genetic information. Speci c sequences, called genes,
codify for proteins that perform most life functions and even make up the majority of cellular
structures.
When genes are altered, the encoded proteins can be unable to carry out their normal functions
and genetic disorders can result. It has been shown that almost all diseases have a genetic
component, whether inherited or resulting from the body’s response to environmental stresses,
like viruses or toxins. In some cases, like cystic brosis or haemophilia, the disease results from
the mutation of a single gene, whereas, in other cases, as for the cholesterol, small genetic
variations become a real disease only in connection with external stimuli.
The knowledge of DNA sequence, therefore, becomes of a central importance both as diag-
nostic and therapeutic tool. Over last fteen years, large efforts have been done to sequence
genomes and in particular the human one.
The ambitious goal proposed by the Human Genome Project, in fact, has attracted the attention
of several groups all around the world, so providing the right incentive for large improvements
in understanding biological processes and conceiving better technical devices. Renewed interest
has been devoted to the comprehension of the function of each gene and the role played by
faulty ones in disease causation.
Currently, gene tests are available to detect mutated sequence. Some tests are used to clarify a
diagnosis and direct a physician towards appropriate treatments, while others allow families to
identify people at high risk for conditions that may be preventable.
Genes themselves can be applied to treat diseases. Speci c DNA sequences, codifying for known
genes, can be introduced in cells in order to replace or supplement a defective gene, or to induce
the secretion of a protein that has a putative therapeutic function.
In principle, any disease may be a candidate for gene therapy. For the moment, research and clin-
ical trials have mainly addressed to inherited diseases, such as cystic brosis [2] or haemophilia
[3], and cancer, with the aim to destroy cancerous cells or to stop tumor growth, suppressing
their proliferation [4]. Genes can also induce the regeneration of damaged tissues, reduce the
reject in organ transplantation [5], or offer new treatments for AIDS [6], alone or in conjunction
with conventional drugs.
The large interest raised by genomics has been accompanied by a parallel improvement
in methods for DNA sequencing and gene expression analysis. The Human Genome Project
itself would not have been possible without huge technological efforts. It was clear in fact that
its important targets could not have been manually achieved but large enhancements in DNA
sequencing performances and automatized procedures were necessary. For the rst time experts
1Introduction
in engineering, physics, chemistry and computer science were brought into close contact: the
existing DNA sequencing technology has been improved and alternative approaches conceived.
Traditional strategy is based on the so called Sanger method: the DNA molecule is divided
in fragments ( 500 base pairs) and for each one a set of copies of different sizes is synthe-
sised. Each replica has a common extremity and a base-speci c uorescent label on the other
end. The entire population is separated by length using gel electrophoresis and the sequence is
reconstructed. This method is now fully automatized and correctly predicts 99.9% of the bases.
Nevertheless the quest for alternative (faster or cheaper) sequencing methods is still an active
eld of research.
Recently, various single molecule experiments have been carried out, allowing a direct inves-
tigation of DNA mechanics and protein-DNA interaction. In contrast to more traditional ones,
these new experiments can give access to dynamical information usually hidden by ensemble
averaging, such as intermediate metastable states or uctuations at the scale of the individual
molecule. Remarkably, sequence content highly affects kinetics. Signature of a sequence de-
pendence have been found in several biological processes, among which the digestion of a DNA
molecule by an exonuclease [7, 8], translocation through nanopores [9, 11], DNA polymeriza-
tion [12] and mechanical unzipping [13, 14].
The question whether they can be used as an alternative sequencing metho

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents