Assessment of data processing to improve reliability of microarray experiments using genomic DNA reference
9 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Assessment of data processing to improve reliability of microarray experiments using genomic DNA reference

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
9 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Using genomic DNA as common reference in microarray experiments has recently been tested by different laboratories. Conflicting results have been reported with regard to the reliability of microarray results using this method. To explain it, we hypothesize that data processing is a critical element that impacts the data quality. Results Microarray experiments were performed in a γ-proteobacterium Shewanella oneidensis . Pair-wise comparison of three experimental conditions was obtained either with two labeled cDNA samples co-hybridized to the same array, or by employing Shewanella genomic DNA as a standard reference. Various data processing techniques were exploited to reduce the amount of inconsistency between both methods and the results were assessed. We discovered that data quality was significantly improved by imposing the constraint of minimal number of replicates, logarithmic transformation and random error analyses. Conclusion These findings demonstrate that data processing significantly influences data quality, which provides an explanation for the conflicting evaluation in the literature. This work could serve as a guideline for microarray data analysis using genomic DNA as a standard reference.

Informations

Publié par
Publié le 01 janvier 2008
Nombre de lectures 0
Langue English

Extrait

BioMed CentralBMC Genomics
Open AccessResearch
Assessment of data processing to improve reliability of microarray
experiments using genomic DNA reference
1 2 1,3 1,3Yunfeng Yang* , Mengxia Zhu , Liyou Wu and Jizhong Zhou
1 2Address: Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA, Computer Science Department, Southern Illinois
3University, Carbondale, IL, USA, 62901, USA and Institute for Environmental Genomics, and Department of Botany and Microbiology,
University of Oklahoma, Norman, OK 73019, USA
Email: Yunfeng Yang* - yang@ornl.gov
* Corresponding author
th from IEEE 7 International Conference on Bioinformatics and Bioengineering at Harvard Medical School
Boston, MA, USA. 14–17 October 2007
Published: 16 September 2008
BMC Genomics 2008, 9(Suppl 2):S5 doi:10.1186/1471-2164-9-S2-S5
<supplement> <title> <p>IEEE 7<sup>th </sup>International Conference on Bioinformatics and Bioengineering at Harvard Medical School</p> </title> <editor>Mary Qu Yang, Jack Y Yang, Hamid R Arabnia and Youping Deng</editor> <note>Research</note> <url>http://www.biomedcentral.com/content/pdf/1471-2164-9-S2-info.pdf</url> </supplement>
This article is available from: http://www.biomedcentral.com/1471-2164/9/S2/S5
© 2008 Yang et al; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Background: Using genomic DNA as common reference in microarray experiments has recently
been tested by different laboratories. Conflicting results have been reported with regard to the
reliability of microarray results using this method. To explain it, we hypothesize that data
processing is a critical element that impacts the data quality.
Results: Microarray experiments were performed in a γ-proteobacterium Shewanella oneidensis.
Pair-wise comparison of three experimental conditions was obtained either with two labeled
cDNA samples co-hybridized to the same array, or by employing Shewanella genomic DNA as a
standard reference. Various data processing techniques were exploited to reduce the amount of
inconsistency between both methods and the results were assessed. We discovered that data
quality was significantly improved by imposing the constraint of minimal number of replicates,
logarithmic transformation and random error analyses.
Conclusion: These findings demonstrate that data processing significantly influences data quality,
which provides an explanation for the conflicting evaluation in the literature. This work could serve
as a guideline for microarray data analysis using genomic DNA as a standard reference.
ized to the same microarray slide [3]. This approach,Background
DNA microarray technology has been quickly adapted by sometimes called type 1 approach [4], is very costly and
mainstream laboratories to explore gene expression pro- tedious for a large number of samples, for which compar-
files of part or whole-genome for an organism [1,2]. A ison across all samples are often desired. Mathematic cal-
number of microarray studies use an experimental design culation reveals that pairing all of the possible pairs for n
in which experimental and reference RNA samples are samples results in a total of n*(n-1)/2 combinations. This
transcribed into cDNA molecules, labeled with different polynomially increasing number could become unman-
fluorescent dyes (typically Cy5 and Cy3) and co-hybrid- ageable for individual laboratory when n is a big number.
Page 1 of 9
(page number not for citation purposes)BMC Genomics 2008, 9(Suppl 2):S5 http://www.biomedcentral.com/1471-2164/9/S2/S5
In addition, it is nearly impossible to compare data across presented in single or double copies in the genomes. It is
experiments since the cDNA reference sample composi- especially useful for microbial functional genomics
tion is subjected to differences of experimental design and because of low representation of repetitive sequences and
hence not universal. It has been desired for a long time to intergenic regions in the genome. In addition, this feature
develop novel strategies to integrate data across multiple, makes it easy to profile absolute mRNA levels. Several
initially unrelated studies between laboratories or over a recent studies have proven that genomic DNA reference is
long period of time to promote data sharing and integra- indeed very effective and faithful for gene expression pro-
tion. Lastly, this approach provides no information on the filing [8-14]. Furthermore, a comparative study between
absolute mRNA abundance, which is often useful since it genomic DNA reference and universal RNA reference has
has been established in that global transcriptional levels reached the conclusion that genomic DNA is superior for
in microorganisms have strong correlation with global routine use [11].
protein levels and gene essentiality [4-6].
Nevertheless, adopting genomic DNA as reference also
A conceptually sound solution to the problems in type 1 creates new challenges. It is conceivable that though this
experiments is to use "reference design", which requires strategy enables the integration of disparate studies, it
co-hybridization of a common reference with all of the brings in new variations. For example, spots with low sig-
samples of the microarrays. Typically, the ratio ( γ1) from nal intensity from labeled genomic DNA are prone to high
cDNA: common reference is compared to another ratio standard errors for measurements, and spots with high
( γ2) from cDNA: common reference. The computed "ratio intensity considerably interfere with the hybridization of
of ratios" ( γ1/ γ2) is considered to be equivalent to direct cDNA samples to the probes, leading to low fidelity in the
cDNA: cDNA comparisons. In contrast to the type 1 ratio of cDNA to genomic DNA. For quality control pur-
approach, this "reference design" approach is called type pose, it is critical to identify these variances and remove
2 approach [4], in which only n microarrays are needed to ambiguous values by data analyses. However, to our best
calculate the ratios of any possible pairs of n samples. knowledge, so far this problem has not been unequivo-
Apparently, this strategy greatly reduces the costs and time cally tackled and there is no consensus among the scien-
incurred in type 1 experiments. In addition, the absolute tific community for the data analyses of microarray using
mRNA abundance of each gene could be deduced from γ1 genomic DNA reference. For instance, some researchers
and γ2, when the copy number of each gene is known for conducted array-to-array comparison with little data
the common reference. processing except for background subtraction and
removal of poor or negative spots [15,16], while the oth-
An ideal reference should fulfill the criteria of universality, ers employed extensive techniques involving complicated
reproducibility and uniformity, meaning that it should be statistical models [8,9,13,14]. It is thus necessary to
universal across diverse microarrays, reproducible over a appraise the performance of different data processing
long time frame and in different laboratories, and repre- techniques.
sents each gene at a uniform level. One kind of such refer-
ences is common RNA pools assembled from a number of In this study, we address this need by conducting a com-
different cell lines, tissues and conditions. Commercial parative study of type 1 and 2 experiments in a γ-proteo-
universal RNA references are now available for mouse and bacterium Shewanella oneidensis, which was capable of
human samples (Stratagene). However, the RNA refer- respiring with oxygen, fumarate, trimethylamine-N-oxide
ences fall well short of the aforementioned criteria. (TMAO), manganese (IV) oxides and ferric oxides as ter-
Although RNA pools are more comprehensive than a sin- minal electron acceptors [17-19]. Gene expression pro-
gle source of RNA sample, it still partially represents the files of S. oneidensis were generated under three growth
whole genome; there is inherent biological variability conditions – aerobic growth or anaerobic growth with
among different RNA samples; and RNA could be fumarate or ferric citrate as electron acceptor. Variations
degraded over time. Therefore, data quality across multi- among gene expression profiles were compared and we
ple studies is inevitably compromised. To address these concluded that data processing techniques, including set-
issues, genomic DNA has been proposed to replace uni- ting minimal number of replicates, logarithmic (log)
versal RNA reference [7]. It is easy and economic to pre- transformation and random error analyses, appeared to
pare genomic DNA in large amount with low variations be valuable to improve data quality.
between different laboratories. Furthermore, genomic
DNA is stable and could be stored over a long period of Results
time. It is independent of variations from one preparation As indicated in the introduction, type 2 experiments using
to another, which is a desirable feature of universal refer- genomic DNA reference could add an additional layer of
ence. In addition, genomic DNA represents

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents