Cours-Licence4

Cours-Licence4

-

Documents
29 pages
Lire
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Bioinformatique Licence L3: Lecture 4P. DerreumauxBanques et bases de données généralistesThe first issue of each year of Nucleic Acids Researchis devoted to articles on biological databases.http://nar.oupjournals.org/Why do we need electronic databases ? 1. Data explosion 2. Data distributionWhat is a database ?1. computerized storehouse of data (records)2. allows user-defined queries3. allows extraction of specified records4. allows adding, changing, removing, and merging of records 5. uses standardized formatsStructure of databases1. ‘Structural’ Aspects (Id, ref, organism, links to other banks, function, phylogeny)2. ‘Expression’ Aspects (coding seq, exons, promoters, 3D structure, motifs, domains)3. Diminution of redondancies, e.g., NR Protein4. Quality criteria vary: automatic or human controlStructure «classique» d’un gène codant une protéine5’ UTR3’ UTRCDSintron intronCAAT TATA AUG T AAUAAexon exon exonMais un gène « classique », cela n’existe pas…. GenBank database recordLOCUS AF350270_1 691 aa linear INV 09-APR-2001StartDEFINITION fibroin 2 [Dolomedes tenebrosus].Accession codeACCESSION AAK30599PID g13561992VERSION AAK30599.1 GI:13561992DBSOURCE locus AF350270 accession AF350270.1KEYWORDS .SOURCE Dolomedes tenebrosus.ORGANISM Dolomedes tenebrosusEukaryota; Metazoa; Arthropoda; Chelicerata; Arachnida; Araneae;Identifiers Araneomorphae; Entelegynae; Lycosoidea; ...

Sujets

Informations

Publié par
Nombre de lectures 23
Langue English
Signaler un problème
Bioinformatique Licence L3: Lecture 4
P. Derreumaux
Banques et bases de données généralistes
The first issue of each year ofNucleic Acids Research is devoted to articles on biological databases. http://nar.oupjournals.org/
Why do we need electronic databases ?
1.
2.
Data explosion
Data distribution
1.
2.
3.
4.
5.
What is a database ?
computerized storehouse of data (records)
allows user-defined queries
allows extraction of specified records
allows adding, changing,removing, and mergingof records
uses standardized formats
1.
2.
3.
4.
Structure of databases
‘Structural’ Aspects (Id, ref, organism, links to other banks, function, phylogeny)
‘Expression’ Aspects (coding seq, exons, promoters, 3D structure, motifs, domains)
Diminution of redondancies, e.g., NR Protein
Quality criteria vary: automatic or human control
Structure «classique» d’un gène codant une protéine
CAAT TATA
5’ UTR
CDS
intron intron
AUG T
exon exon exon
3’ UTR
AAUAA
Mais un gène « classique », cela n’existe pas….
Star
GenBank database record
Identifiers
LOCUS AF350270 1 691 aa linear INV 09-APR-2001 _ DAECFCIENSISTIIOONN     fAiAbKr3o05i9n9 2 [Dolomedestenebrosus].Accession code PID g13561992 VERSION AAK30599.1 GI:13561992 DBSOURCE locus AF350270 accession AF350270.1 KEYWORDS . SOURCE Dolomedes tenebrosus. ORGANISM Dolomedes tenebrosus Eukaryota; Metazoa; Arthropoda; Chelicerata; Arachnida; Araneae; Araneomorphae; Entelegynae; Lycosoidea; Pisauridae; Dolomedes. REFERENCE 1 (residues 1 to 691) AUTHORS Gatesy,J., Hayashi,C., Motriuk,D., Woods,J. and Lewis,R. TITLE Extreme diversity, conservation, and convergence of spider silk fibroin sequences JOURNAL Science 291 (5513), 2603-2605 (2001) MEDLINE 21179804 COMMENT Method: conceptual translation supplied by author. FEATURES Location/Qualifiers source 1..691 /organism="Dolomedes tenebrosus" /db xref="taxon:156846" _ Protein <1..691 /product="fibroin 2" /name="fibroin" CDS 1..691 /coded by="AF350270.1:<1..2078" _ ORIGIN 1 ygqgsgagaa aaaaaaggag qsgsgpygas ylssttytts sqgagggvgg ygqgsgtgsa 61 aaaagaagag qggqggygqg agqgglggyg qgggagaaaa aaaaaggags gqggyggqgg // End of record
Qualifiers
How to search for database records ?
1. -------
Identifiers unique string of letters and digits that often are interpretable in a meaningful way by humans in GenBank:LOCUS(e.g. SCU32124) in SwissProt:ENTRY NAME(YNT2_YEAST) canchange !!! *** OTHER IDENTIFIERS *** VERSION= extension increases after every change GIchange completely when seq. is altered= may
Identifier
Version
OLC
GenBank Record
USSCU49845 5028 bpND ALPN 21-JUN-1999 
DEFINITIONSaccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2) and Rev7p (REV7) genes, complete cds. ACCESSIONU49845GenInfo Identifi VERSIONU49845.1GI:1293613er KEYWORDS. SOURCEbaker's yeast. ORGANISMSaccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Hemiascomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. REFERENCE1 (bases 1 to 5028) AUTHORSTorpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. TITLECloning and sequence of REV7, a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae
How to search for database records ?
2.
-
-
-
-
Accession numbers
in GenBank: 1 letter + 5 digits (U12345) or 2 letters + 6 digits (AF123456)
in SwissProt: 1 letter + 5 digits (P04049)
stable,does not change
will get transferred to new record if records are merged
*** USE ACCESSION NO. FOR SEARCHES ***  
Identifier
ACCESSION
Version
GenBank Record
LOCUSSCU49845 5028 bp DNA PLN 21-JUN-1999 DEFINITIONSaccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2) and Rev7p (REV7) genes, complete cds. ACCESSIONU49845 VERSIONU49845.1GI:1293613GenInfo Identifier KEYWORDS. SOURCEbaker's yeast. ORGANISMSaccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Hemiascomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. REFERENCE1 (bases 1 to 5028) AUTHORSTorpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. TITLECloning and sequence of REV7, a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae
Nucleic acid sequence databases