Analysis of optimal differential gene expression [Elektronische Ressource] / von Wolfram Liebermeister
171 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Analysis of optimal differential gene expression [Elektronische Ressource] / von Wolfram Liebermeister

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
171 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Sujets

Informations

Publié par
Publié le 01 janvier 2004
Nombre de lectures 23
Langue English
Poids de l'ouvrage 5 Mo

Extrait

Analysis of optimal differential gene expression
D I S S E R T A T I O N
zur Erlangung des akademischen Grades
doctor rerum naturalium
(Dr. rer. nat.)
im Fach Biophysik/Theoretische Biophysik
eingereicht an der
Mathematisch-Naturwissenschaftlichen Fakult¨at I
Humboldt-Universitat zu Berlin¨
von
Herr Dipl.-Phys. Wolfram Liebermeister
geboren am 25.7.1972 in Tubingen¨
Prasident der Humboldt-Universitat zu Berlin:¨ ¨
Prof. Dr. Jurg¨ en Mlynek
Dekan der Mathematisch-Naturwissenschaftlichen Fakultat I:¨
Prof. Dr. Michael Linscheid
Gutachter:
1. Prof. Dr. Reinhart Heinrich
2. Prof. Dr. Thomas Hofer¨
3. Prof. Dr. Martin Vingron
eingereicht am: 21. Mai 2003
Tag der mundlic¨ hen Prufung:¨ 13. Januar 2004Abstract
Thisthesisisconcernedwiththeobservationthatcoregulationpatternsingeneexpression
data often reflect functional structures of the cell. First, simulated gene expression data
and expression data from yeast experiments are studied with independent component
analysis (ICA) and with related factor models. Then, in a more theoretical approach,
relations between gene expression patterns and the biological function of the genes are
derived from an optimality principle.
LinearfactormodelssuchasICAdecomposegeneexpressionmatricesintostatisticalcom-
ponents. The coefficients with respect to the components can be interpreted as profiles of
hidden variables (called “expression modes”) that assume different values in the different
samples. In contrast to clusterings, such factor models account for a superposition of
effects and for individual responses of the different genes: each gene profile consists of a
superposition of the expression modes, which thereby account for the common variation
of many genes. The components are estimated blindly from the data, that is, without
furtherbiologicalknowledge, andmostofthemethodsconsideredherecanreconstructal-
most sparse components. Thresholding a component reveals genes that respond strongly
to the corresponding mode, in comparison to genes showing differential expression among
individual samples.
In this work, different factor models are applied to simulated and experimental expres-
sion data. To simulate expression data, it is assumed that gene expression depends on
several unobserved variables (“biological expression modes”) which characterise the cell
state and that the genes respond to them according to nonlinear functions called “gene
programs”. Is there a chance to reconstruct such expression modes with a blind data
analysis? The tests in this work show that the modes can be found with ICA even if the
data are noisy or weakly nonlinear, or if the numbers of true and estimated components
do not match. Regression models are fitted to the profiles of single genes to explain their
expression by expression modes from factor models or by the expression of single tran-
scription factors. Nonlinear gene programs are estimated by nonlinear ICA: such effective
gene programs may be used for describing gene expression in large cell models. ICA and
similar methods are applied to expression data from cell-cycle experiments: besides bi-
ologically interpretable modes, experimental artefacts, probably caused by hybridisation
effectsandcontaminationofthesamples, areidentified. Itisshownforsinglecomponents
that the coregulated genes share biological functions and the corresponding enzymes are
concentrated in particular regions of the metabolic network.Thus the expression machinery seems to portray - as an outcome of evolution - functional
relationships between the genes: regarding the economy of resources, it would probably
be inefficient if cooperating genes were not coregulated. To formalise this teleological
view on gene expression, a mathematical model for the analysis of optimal differential
expression (ANODE) is proposed in this work: the model describes regulators, such as
genes or enzymes, and output variables, such as metabolic fluxes. The system´s be-
haviouris evaluated bya fitness function, which, for instance, rates some of the metabolic
fluxes in the cell and which is supposed to be optimised. This optimality principle defines
an optimal response of regulators to small external perturbations. For calculating the
optimal regulation patterns, the system to be controlled needs to be known only par-
tially: it suffices to predefine its possible behaviour around the optimal state and the
local shape of the fitness function. The method is extended to time-dependent perturba-
tions: to describe the response of metabolic systems to small oscillatory perturbations,
frequency-dependent control coefficients are defined and characterised by summation and
connectivitytheorems. Fortestingthepredictedrelationbetweenexpressionandfunction,
control coefficients are simulated for a large-scale metabolic network and their statistical
properties are studied: the structure of the control coefficients matrix portrays the net-
work topology, that is, chemical reactions tend to have little control on distant parts of
the network. Furthermore, control coefficients within subnetworks depend only weakly
on the modelling of the surrounding network.
Several plausible assumptions about appropriate expression patterns can be formally de-
rived from the optimality principle: the main result is a general relation between the
behaviour of regulators and their biological functions, which implies, for example, the
coregulation of enzymes acting in complexes or functional modules. In this context, the
functions of genes are quantified by their linear influences (called “response coefficients”)
on fitness-relevant cell variables. For enzymes controlling metabolism, the theorems of
metaboliccontroltheoryleadtosumrulesthatrelatetheexpressionpatternstothestruc-
tureofthemetabolicnetwork. Furtherpredictionsconcernasymmetriccompensationfor
gene deletions and a relation between gene expression and the fitness loss caused by gene
deletions. If optimal regulation is realised by feedback signals between the cell variables
and the regulators, then functional relations are also portrayed in the linear feedback
coefficients, so genes of similar function may be expected to share inputs from the same
signalling cascades. According to the model of optimal regulation, expression profiles
are linear combinations of response coefficient profiles: tests with experimental expres-
sion profiles and simulated control coefficients support this hypothesis, and the common
components which are estimated from both kinds of data provide a vivid picture of the
metabolic adaptations that are required in different environments.
To summarise, empirical relations between gene expression and function have been con-
firmed in this work. Furthermore, such relations have been predicted on theoretical
iigrounds. A main aim is to clarify teleological assertions about gene expression by de-
riving them from explicit assumptions, and thus to provide a theoretical framework for
the integration of expression data and functional annotations. While other authors have
comparedexpressiontofunctionalgenecategoriesortopologicallydefinedmetabolicpath-
ways, I propose to relate it to the response coefficients. A main result of this work is that
generalrelationsarepredictedbetweenagene’sfunction,itsoptimalexpressionbehaviour,
and its regulatory program. Where the assumption of optimality is valid, the model justi-
fies the use of expression data for functional annotation and pathway reconstruction, and
it provides a function-related interpretation for the linear components behind expression
data. The methods from this work are not limited to gene expression data: the factor
models are applicable to protein and metabolite data as well, and the optimality principle
may also apply to other regulatory mechanisms, such as the allosteric control of enzymes.
Keywords:
differential expression, optimal control, metabolic control theory, gene function
iiiZusammenfassung
Diese Doktorarbeit behandelt die Beobachtung, daß Koregulationsmuster in Genexpressi-
onsdaten h¨aufig Funktionsstrukturen der Zelle widerspiegeln. Zun¨achst werden simulierte
GenexpressionsdatenundExpressionsdatenausHefeexperimentenmitHilfevonIndepen-
dent Component Analysis (ICA) und verwandten Faktormodellen untersucht. In einem
eher theoretischen Zugang werden anschließend Beziehungen zwischen den Expressions-
mustern und der biologischen Funktion der Gene aus einem Optimalit¨atsprinzip hergelei-
tet.
Lineare Faktormodelle, beispielsweise ICA, zerlegen Genexpressionsmatrizen in statisti-
sche Komponenten: die Koeffizienten bezuglich der Komponenten konnen als Profile von¨ ¨
verborgenen Variablen (“Expressionsmoden”) interpretiert werden, deren Werte zwischen
den Proben variieren. Im Gegensatz zu Clustermethoden beschreiben solche Faktormo-
¨delle eine Uberlagerung biologischer Effekte und die individuellen Reaktionen der einzel-
¨nen Gene: jedes Genprofil besteht aus einer Uberlagerung der Expressionsmoden, die so
die gemeinsamen Schwankungen vieler Gene erkl¨aren. Die linearen Komponenten wer-
den blind, also ohne zus¨atzliches biologisches Wissen, aus den Daten gesch¨atzt, und die
meisten der hier betrachteten Methoden erlauben es, nahezu schwach besetzte Kompo-
nenten zu rekonstruieren. Beim Ausdunnen¨ einer Komponente werden Gene sichtbar,

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents