Protein sequence and structure comparison based on vectorial representations [Elektronische Ressource] / von Florian Teichert

technischen_universitat_darmstadt

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

105 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Sujets

Biologie

Informations

Publié par	technischen_universitat_darmstadt
Publié le	01 janvier 2009
Nombre de lectures	12
Langue	English
Poids de l'ouvrage	5 Mo

Extrait

Protein Sequence and Structure
Comparison based on
Vectorial Representations
Vom Fachbereich Physik
der Technischen Universitat Darmstadt
zur Erlangung des Grades eines
Doktors der Naturwissenschaften
(Dr. rer. nat.)
genehmigte Dissertation
von
Dipl.-Phys. Florian Teichert
geboren in Frankfurt am Main
Darmstadt 2009
D171. Gutachten: Prof. Dr. rer. nat. Markus Porto
2. Gutachten: Prof. Dr. rer. nat. Barbara Drossel
Tag der Einreichung: 2. Dezember 2008
Tag der Prufung: 16. Februar 2009Abstract
Proteins are very complex physical objects consisting of thousands of atoms and
hundreds of amino acids with complicated local and global interactions on length
scales ranging from the microscopic neighbourhood of atoms to the macroscopic size
of organisms. The spatial conguration, in spite of that, is encoded into one single
character per amino acid using a twenty character alphabet, an apparent contradic-
tion that is not fully understood to date.
This thesis is concerned with problems of protein structure and the relationship
of protein sequence and structure. It is tried to integrate the di erent approaches
typically carried out by physicists in the eld that investigate very simplied model
systems, e.g. single α-helices, with the bioinformatics approach to build powerful
analysis tools. The rst approach often leads to oversimplied systems that do not
describe native proteins as a whole, while the second can be too heuristic and too
involved to answer fundamental questions.
We start from de ning vectorial descriptions of protein structure, similar in form to
sequence descriptions, to rstly compare protein structures, i.e. to perform structure
alignments, and discuss several measures for structural similarity. From these we
derive a statistical structural similarity score for pairs of protein structure based on
their spatial superimposition.
Then we utilize a previously known ansatz to exploit the sequence to structure cor-
relation in order to predict vectorial structure descriptions from protein sequence.
These predicted pro les are then used within the same alignment framework to align
protein sequences. For these alignments a basic evolutionary similarity measure be-
tween protein sequences is derived.
Large part of this thesis is dedicated to the objective assessment of alignment meth-
ods including the new method presented and a number of establish programs.
A commonly used measure of structural similarity, the Percentage of Structural
Identity (PSI), is discussed and generalized to cover an internal degree of freedom
in structure that was ignored formerly. The improvement is achieved by very simple
but powerful reasoning. The resulting scheme is also applicable to detect hinges in
protein structures.
Concluding, we state that protein structure, despite its complexity, is indeed to a
large extent one-dimensional. The unication of structure and sequence alignments
under a single formalism gives some insight into the relation of sequence and struc-
ture in proteins.