CONTRIBUTED RESEARCH ARTICLES

icon

9

pages

icon

English

icon

Documents

Écrit par

Publié par

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

9

pages

icon

English

icon

Ebook

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

44 CONTRIBUTED RESEARCH ARTICLES Online Reproducible Research: An Application to Multivariate Analysis of Bacterial DNA Fingerprint Data by Jean Thioulouse, Claire Valiente-Moro and Lionel Zen- ner Abstract This paper presents an example of online reproducible multivariate data analysis. This example is based on a web page provid- ing an online computing facility on a server. HTML forms contain editable R code snippets that can be executed in any web browser thanks to the Rweb software. The example is based on the multivariate analysis of DNA fingerprints of the internal bacterial flora of the poultry red mite Dermanyssus gallinae. Several multivariate data analysis methods from the ade4 package are used to compare the fingerprints of mite pools coming from various poultry farms. All the com- putations and graphical displays can be redone interactively and further explored online, using only a web browser. Statistical methods are de- tailed in the duality diagram framework, and a discussion about online reproducibility is initi- ated. Introduction Reproducible research has gained much at- tention recently (see particularly http:// reproducibleresearch.net/ and the references therein). In the area of Statistics, the availabil- ity of Sweave (Leisch (2002), uni-muenchen.de/~leisch/Sweave/) has proved ex- tremely useful and Sweave documents are now widely used. Sweave offers the possibility to have text, R code, and outputs of this code in the same document.

  • group means

  • methods

  • poultry farms

  • genomics can

  • bacterial flora

  • rweb

  • ttge

  • mite pools

  • breed- ing facilities

  • ttge banding


Voir icon arrow

Publié par

Nombre de lectures

46

Langue

English

Poids de l'ouvrage

1 Mo

44

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

OnlineReproducibleResearch:An
ApplicationtoMultivariateAnalysisof
BacterialDNAFingerprintData
byJeanThioulouse,ClaireValiente-MoroandLionelZen-
andcommentRcodesnippets,toredoallthecompu-
ner
tationsandtodrawallthegraphicaldisplaysofthe
originalpaper.
Abstract
ThispaperpresentsanexampleofTheexamplepresentedhererelatestomultivari-
onlinereproduciblemultivariatedataanalysis.atedataanalysis.Reproducibilityofmultivariate
Thisexampleisbasedonawebpageprovid-dataanalysesisparticularlyimportantbecausethere
inganonlinecomputingfacilityonaserver.isalargenumberofdifferentmethods,withpoorly
HTMLformscontaineditableRcodesnippetsdefinednames,makingitoftendifficulttoknow
thatcanbeexecutedinanywebbrowserthankswhathasbeendoneexactly.Manymethodshave
totheRwebsoftware.Theexampleisbasedseveraldifferentnames,accordingtothecommu-
onthemultivariateanalysisofDNAfingerprintsnitywheretheywereinvented(orre-invented),and
oftheinternalbacterialfloraofthepoultryredwheretheyareused.Translationoftechnicalterms
mite
Dermanyssusgallinae
.SeveralmultivariateintolanguagesotherthanEnglishisalsodifficult,as
dataanalysismethodsfromthe
ade4
packagearemanyareambiguousandhave“falsefriends”.More-
usedtocomparethefingerprintsofmitepoolsover,dataanalysismethodsmakeintensiveuseof
comingfromvariouspoultryfarms.Allthecom-graphicaldisplays,withalargenumberofgraphical
putationsandgraphicaldisplayscanberedoneparameterswhichcanbechangedtoproduceawide
interactivelyandfurtherexploredonline,usingvarietyofdisplays.
onlyawebbrowser.Statisticalmethodsarede-
tailedinthedualitydiagramframework,andaThissituationissimilartowhatwasdescribedby
discussionaboutonlinereproducibilityisiniti-
BuckheitandDonoho(1995
)intheareaofwavelet
ated.research.TheirsolutionwastopublishtheMat-
labcodeusedtoproducethefiguresintheirre-
searcharticles.Ratherthandothiswedecidedto
Introduction
setupasimplecomputerenvironmentusingRto
offeronlinereproducibility.In2002,weinstalled
Reproducibleresearchhasgainedmuchat-anupdatedversionoftheRwebsystem(
Banfield,
tentionrecently(seeparticularly
http://
1999
)onourdepartmentserver(see
http://pbil.
reproducibleresearch.net/
andthereferences
univ-lyon1.fr/Rweb/Rweb.general.html
),andwe
therein).IntheareaofStatistics,theavailabil-implementedseveralcomputationalwebservicesin
ityofSweave(
Leisch(2002
),
http://www.stat.
thefieldofcomparativegenomics(
Perrièreetal.
uni-muenchen.de/~leisch/Sweave/
)hasprovedex-
(2003
),seeforexample
http://pbil.univ-lyon1.
tremelyusefulandSweavedocumentsarenow
fr/mva/coa.php
).
widelyused.SweaveoffersthepossibilitytohaveThisservernowcombinesthecomputational
text,Rcode,andoutputsofthiscodeinthesamepowerofRwithsimpleHTMLforms(assuggested
document.Thismakesreproducibilityofascientificby
deLeeuw(2001
)forXlisp-Stat)andtheabilityto
paperwritteninSweaveverystraightforward.searchonlinemoleculardatabaseswiththe
seqinr
However,usingSweavedocumentsimpliesapackage(
CharifandLobry,2007
).Itisalsoused
goodknowledgeofR,ofSweaveandofL
A
TEX.Italsobyseveralresearcherstoprovideanonlinerepro-
requireshavingRinstalledonone’scomputer.Theducibilityserviceforscientificpapers(seeforexam-
installedversionofRmustbecompatiblewiththeple
http://pbil.univ-lyon1.fr/members/lobry/
).
RcodeintheSweavedocument,andalltheneededThepresentpaperprovidesanexampleofanap-
packagesmustalsobeinstalled.Thismaybeaprob-plicationofthisservertotheanalysisofDNAfinger-
lemforsomescientists,forexampleformanybiolo-printsbymultivariateanalysismethods,usingthe
gistswhoarenotfamiliarwithRandL
A
TEX,orwho
ade4
package(
Chesseletal.,2004
;
DrayandDu-
donotusethemonaregularbasis.
four,2007
).Wehaveshownrecently(
Valiente-Moro
Inthispaper,wedemonstrateanonlinerepro-
etal.,2009
)thatmultivariateanalysistechniquescan
ducibilitysystemthathelpscircumventtheseprob-beusedtoanalysebacterialDNAfingerprintsand
lems.Itisbasedonawebpagethatcanbeusedwiththattheymakeitpossibletodrawusefulconclu-
anywebbrowser.ItdoesnotrequireRtobeinstalledsionsaboutthecompositionofthebacterialcommu-
locally,nordoesitneedathoroughknowledgeofRnitiesfromwhichtheyoriginate.Wealsodemon-
andL
A
TEX.Neverthelessitallowstheusertopresentstratetheeffectivenessofprincipalcomponentanal-

TheRJournalVol.2/1,June2010

ISSN2073-4859

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

ysis,between-groupanalysisandwithin-groupanal-
ysis[PCA,BGAandWGA,
Benzécri(1983
);
Dolédec
andChessel(1987
)]toshowdifferencesindiversity
betweenbacterialcommunitiesofvariousorigins.
Insummary,weshowherethatitiseasytoset
upasoftwareenvironmentofferingfullonlinerepro-
ducibilityofcomputationsandgraphicaldisplaysof
multivariatedataanalysismethods,evenforusers
whoarenotfamiliarwithR,SweaveorL
A
TEX.
Dataandmethods
Inthissection,wefirstdescribethebiologicalmate-
rialusedintheexampledataset.Thenwepresent
thestatisticalmethods(PCA,BGAandWGA),inthe
frameworkofthedualitydiagram(
Escoufier,1987
;
Holmes,2006
).Wealsodetailthesoftwareenviron-
mentthatwasused.
Biologicalmaterial
Thepoultryredmite,
Dermanyssusgallinae
isan
haematophagousmitefrequentlypresentinbreed-
ingfacilitiesandespeciallyinlayinghenfacilities
(
Chauve,1998
).Thisarthropodcanberesponsi-
bleforanemia,dermatitis,weightlossandade-
creaseineggproduction(
Kirkwood,1967
).Ithas
alsobeeninvolvedinthetransmissionofmany
pathogenicagentsresponsibleforseriousdiseasesin
bothanimalsandhumans(
ValienteMoroetal.,2005
;
Valiente-Moroetal.,2007
).Thepoultryredmiteis
thereforeanemergingproblemthatmustbestudied
tomaintaingoodconditionsincommercialeggpro-
ductionfacilities.Nothingisknownaboutitsassoci-
atednon-pathogenicbacterialcommunityandhow
thediversityofthemicroflorawithinmitesmayin-
fluencethetransmissionofpathogens.
Moststudiesoninsectmicrofloraarebased
onisolationandcultureoftheconstituentmicro-
organisms.However,comparisonofculture-based
andmolecularmethodsrevealsthatonly20-50%of
gutmicrobescanbedetectedbycultivation(
Suau
etal.,1999
).Molecularmethodshavebeendevel-
opedtoanalysethebacterialcommunityincom-
plexenvironments.Amongthesemethods,Dena-
turingGradientandTemporalTemperatureGelElec-
trophoresis(DGGEandTTGE)(
Muyzer,1998
)have
alreadybeenusedsuccessfully.
Fulldetailsoftheexampledatasetusedinthis
paper(mitessampling,DNAextraction,PCRampli-
ficationof16SrDNAfragmentsandTTGEbanding
patternachieving)aregivenin
Valiente-Moroetal.
(2009
).Briefly,13poultryfarmswereselectedinthe
BretagneregioninFrance,andineachfarm15sin-
glemites,fivepoolsof10mitesandonepoolof50
miteswerecollected.Theresultsforsinglemitesand
mitepoolswereanalysedseparately,butonlythere-
sultsofmitepoolsarepresentedinthisstudy,asthey

TheRJournalVol.2/1,June2010

54

hadthemostillustrativeanalysis.Bandingpatterns
canbeanalysedasquantitativevariables(intensity
ofbands),orasbinaryindicators(presence/absence
ofbands),butforthesamereason,onlythepres-
ence/absencedatawereusedinthispaper.
TTGEbandingpatternswerecollectedinadata
table,withbandsincolumns(55columns)andmite
poolsinrows(73rows).Thistablewasfirstsub-
jectedtoaprincipalcomponentanalysis(PCA)toget
anoverallideaofthestructureofthebandingpat-
terns.Between-groupanalysis(BGA)wasthenap-
pliedtostudythedifferencesbetweenpoultryfarms,
andfinallywithin-groupanalysis(WGA)wasused
toeliminatethefarmeffectandobtainthemainchar-
acteristicsofthecommonbacterialfloraassociated
with
D.gallinae
instandardbreedingconditions.A
goodknowledgeofthesecharacteristicswouldallow
comparisonsofthestandardbacterialfloraamong
varioussituations,suchasgeographicregions,type
andlocationofbreedingfacilities(particularlyor-
ganicfarms),ordevelopmentalstagesof
D.gallinae
.

Dualitydiagramofprincipalcomponent
analysis
Let
X
=[
x
ij
]
(
n
,
p
)
betheTTGEdatatablewith
n
rows
(individuals=mitepools)and
p
columns(variables
=TTGEbands).Variableshavemean
x
¯
j
=
n
1

i
x
ij
andvariance
σ
j
2
=
n
1

i
(
x
ij

x
¯
j
)
2
.Individualsbe-
longto
g
groups(orclasses),namely
G
1
,...,
G
g
,with
groupcounts
n
1
,...,
n
g
,and

n
k
=
n
.
Usingdualitydiagramtheoryandtripletno-
tation,thePCAof
X
istheanalysisofatriplet
(
X
0
,
D
p
,
D
n
)
.
X
0
isthetableofstandardizedvalues:
X
0
=[
x
˜
ij
]
(
n
,
p
)
x
ij

x
¯
j
with
x
˜

Voir icon more