La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

Bio++ Tutorial & Cookbook

62 pages
Bio++ Tutorial & Cookbook
Julien Dutheil
Sylvain Gaillard
March 31, 2010 2 i
This tutorial aims at providing help for getting started with the Bio++ libraries.
It introduces some of the Bio++ fundamental classes in order to get familiar with the
libraries philosophy. For an exhaustive list of available classes and methods, or get a de-
tailed description of what each class can do, one shouldlook at the class documentation.
Each section in this tutorial works on a detailed example. Each new object instan-
tiated is introduced briefly, and all methods are described.
Prerequisite Users must have a basic knowledge of the C(++) language. An experi-
ence in object programming will help, but is not absolutely required for the first part of
this tutorial.
Organization of this manual This manual starts with a general introduction to
Bio++, and with technical concern on how to install and use the Bio++ libraries on
variousoperatingsystems. Thefirstpartofthismanualthenconsistsin basicutilization
of Bio++. It is split into thematic chapters. Each chapter begins with a working
example, followed by more detailed explanation on the classes introduced.
The second part of the manual will consist of several recipes on how to perform
various kinds of analyses. ii Contents
1 Introduction 1
1.1 You probably want to know... . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Whether Bio++ is a C++ port of BioPerl, BioJava or BioPython? 2
1.1.2 What are the legal limitations when using ...
Voir plus Voir moins
Bio++
Tutorial
Julien Sylvain
March
&
Co
Dutheil Gaillard
31,
2010
okb
o
ok
2
i
This tutorial aims at providing help for getting started with the Bio++ libraries. It introduces some of the Bio++ fundamental classes in order to get familiar with the libraries philosophy. For an exhaustive list of available classes and methods, or get a de-tailed description of what each class can do, one should look at the class documentation. Each section in this tutorial works on a detailed example. Each new object instan-tiated is introduced briefly, and all methods are described.
Prerequisite experi- AnUsers must have a basic knowledge of the C(++) language. ence in object programming will help, but is not absolutely required for the first part of this tutorial.
Organization of this manualThis manual starts with a general introduction to Bio++, and with technical concern on how to install and use the Bio++ libraries on various operating systems. The first part of this manual then consists in basic utilization of Bio++. It is split into thematic chapters. Each chapter begins with a working example, followed by more detailed explanation on the classes introduced. The second part of the manual will consist of several recipes on how to perform various kinds of analyses.
ii
Contents
1 Introduction 1 1.1 You probably want to know... . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Whether Bio++ is a C++ port of BioPerl, BioJava or BioPython? 2 1.1.2 What are the legal limitations when using Bio++? . . . . . . . . . 2 1.1.3 How you can participate in Bio++ development? . . . . . . . . . . 2 1.1.4 Whether Bio++ is highly maintained and enhanced? . . . . . . . . 2 1.2 The starting point: installing and using Bio++ . . . . . . . . . . . . . . . 3 1.2.1 What you need to use Bio++ . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Installing pre-compiled version of Bio++ (currently linux only) . . 3 1.2.3 Installing Bio++ from source files: the automatic way . . . . . . . 4 1.2.4 Installing Bio++ from source files: the manual way . . . . . . . . 4 1.2.5 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Installation under Windows (not using CygWin). . . . . . . . . . . . . . . 6 1.4 Developping with Bio++ and Eclipse . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.2 Creating projects for the libraries . . . . . . . . . . . . . . . . . . . 7 1.4.3 Create your own project . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.4Usingeclipseunderwindows.....................10 1.5 Portability and specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
I Bio++ tutorial 15 2 Managing sequences 17 2.1 A working example: compute the GC content of each sequence in a Fasta file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.1Readingsequencesfromale.....................18 2.1.2 Working on a sequence container . . . . . . . . . . . . . . . . . . . 19 2.2 More on sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 iii
ivCONTENTS 2.2.2 Working on sequences . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 Reading and writing sequences from/to a file . . . . . . . . . . . . 22 2.2.5 Working with alignments . . . . . . . . . . . . . . . . . . . . . . . 23 3 Trees 27 3.1 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Parsimony reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Models (1): basic concepts 29 4.1 A working example: building a NJ tree and re-estimating parameters with ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Retrieving sequences from a Phylip file . . . . . . . . . . . . . . . . 29 4.1.2 Building an evolutionary model . . . . . . . . . . . . . . . . . . . . 30 4.1.3 Estimating distances from sequences . . . . . . . . . . . . . . . . . 30 4.1.4 Building a Neighbor-joining tree . . . . . . . . . . . . . . . . . . . 31 4.1.5 Estimating parameters using maximum likelihood . . . . . . . . . 31 4.1.6 Writing the tree to a file . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Models interfaces and classes . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Models (2): data simulation 33 6 Models (3): parameter estimation 35 7 Models (4): phylogenetic inference 37 7.1 Maximum likelihood methods . . . . . . . . . . . . . . . . . . . . . . . . . 37 7.2 Distance based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 8 Models (5): empirical Bayesian approach 39 8.1 Site specific substitution rate estimation . . . . . . . . . . . . . . . . . . . 39 8.2 Ancestral state reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 39 8.3 Substitution mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 9 Models (6): more complex models 41 9.1 Heterotachous models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 9.2 Non-homogeneous models . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 10 Using the Population Genetics library 43 10.1 Data-sets management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 10.2Statistics....................................44 10.2.1 Sequence data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 10.2.2 Allelic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
CONTENTS
I
I
Bio++
co
okb
o
ok
v
49
vi
CONTENTS
List
2.1
of
Class
Figures
hierarchy
diagram
of
alphabet
vii
classes.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
viii
LIST
OF
FIGURES
Chapter1 Introduction
Contents 1.1 You probably want to know... 1.1.1 Whether Bio++ is a C++ port of BioPerl, BioJava or BioPython? 1.1.2 What are the legal limitations when using Bio++? 1.1.3 How you can participate in Bio++ development? 1.1.4 Whether Bio++ is highly maintained and enhanced? 1.2 The starting point: installing and using Bio++ 1.2.1 What you need to use Bio++ 1.2.2 Installing pre-compiled version of Bio++ (currently linux only) 1.2.3 Installing Bio++ from source files: the automatic way 1.2.4 Installing Bio++ from source files: the manual way 1.2.5 Usage 1.3 Installation under Windows (not using CygWin). 1.4 Developping with Bio++ and Eclipse 1.4.1 Prerequisites 1.4.2 Creating projects for the libraries 1.4.3 Create your own project 1.4.4 Using eclipse under windows 1.5 Portability and specificity
1.1 You probably want to know... 1.1.1 Whether Bio++ is a C++ port of BioPerl, BioJava or BioPy-thon? No! The main goal of Bio++ is to provide reusable code in order to help the development of new methods for Bioinformatics. Such methods usually deal with sequence alignments, 1
Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin