9
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
9
pages
English
Ebook
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
44
C
ONTRIBUTED
R
ESEARCH
A
RTICLES
OnlineReproducibleResearch:An
ApplicationtoMultivariateAnalysisof
BacterialDNAFingerprintData
byJeanThioulouse,ClaireValiente-MoroandLionelZen-
andcommentRcodesnippets,toredoallthecompu-
ner
tationsandtodrawallthegraphicaldisplaysofthe
originalpaper.
Abstract
ThispaperpresentsanexampleofTheexamplepresentedhererelatestomultivari-
onlinereproduciblemultivariatedataanalysis.atedataanalysis.Reproducibilityofmultivariate
Thisexampleisbasedonawebpageprovid-dataanalysesisparticularlyimportantbecausethere
inganonlinecomputingfacilityonaserver.isalargenumberofdifferentmethods,withpoorly
HTMLformscontaineditableRcodesnippetsdefinednames,makingitoftendifficulttoknow
thatcanbeexecutedinanywebbrowserthankswhathasbeendoneexactly.Manymethodshave
totheRwebsoftware.Theexampleisbasedseveraldifferentnames,accordingtothecommu-
onthemultivariateanalysisofDNAfingerprintsnitywheretheywereinvented(orre-invented),and
oftheinternalbacterialfloraofthepoultryredwheretheyareused.Translationoftechnicalterms
mite
Dermanyssusgallinae
.SeveralmultivariateintolanguagesotherthanEnglishisalsodifficult,as
dataanalysismethodsfromthe
ade4
packagearemanyareambiguousandhave“falsefriends”.More-
usedtocomparethefingerprintsofmitepoolsover,dataanalysismethodsmakeintensiveuseof
comingfromvariouspoultryfarms.Allthecom-graphicaldisplays,withalargenumberofgraphical
putationsandgraphicaldisplayscanberedoneparameterswhichcanbechangedtoproduceawide
interactivelyandfurtherexploredonline,usingvarietyofdisplays.
onlyawebbrowser.Statisticalmethodsarede-
tailedinthedualitydiagramframework,andaThissituationissimilartowhatwasdescribedby
discussionaboutonlinereproducibilityisiniti-
BuckheitandDonoho(1995
)intheareaofwavelet
ated.research.TheirsolutionwastopublishtheMat-
labcodeusedtoproducethefiguresintheirre-
searcharticles.Ratherthandothiswedecidedto
Introduction
setupasimplecomputerenvironmentusingRto
offeronlinereproducibility.In2002,weinstalled
Reproducibleresearchhasgainedmuchat-anupdatedversionoftheRwebsystem(
Banfield,
tentionrecently(seeparticularly
http://
1999
)onourdepartmentserver(see
http://pbil.
reproducibleresearch.net/
andthereferences
univ-lyon1.fr/Rweb/Rweb.general.html
),andwe
therein).IntheareaofStatistics,theavailabil-implementedseveralcomputationalwebservicesin
ityofSweave(
Leisch(2002
),
http://www.stat.
thefieldofcomparativegenomics(
Perrièreetal.
uni-muenchen.de/~leisch/Sweave/
)hasprovedex-
(2003
),seeforexample
http://pbil.univ-lyon1.
tremelyusefulandSweavedocumentsarenow
fr/mva/coa.php
).
widelyused.SweaveoffersthepossibilitytohaveThisservernowcombinesthecomputational
text,Rcode,andoutputsofthiscodeinthesamepowerofRwithsimpleHTMLforms(assuggested
document.Thismakesreproducibilityofascientificby
deLeeuw(2001
)forXlisp-Stat)andtheabilityto
paperwritteninSweaveverystraightforward.searchonlinemoleculardatabaseswiththe
seqinr
However,usingSweavedocumentsimpliesapackage(
CharifandLobry,2007
).Itisalsoused
goodknowledgeofR,ofSweaveandofL
A
TEX.Italsobyseveralresearcherstoprovideanonlinerepro-
requireshavingRinstalledonone’scomputer.Theducibilityserviceforscientificpapers(seeforexam-
installedversionofRmustbecompatiblewiththeple
http://pbil.univ-lyon1.fr/members/lobry/
).
RcodeintheSweavedocument,andalltheneededThepresentpaperprovidesanexampleofanap-
packagesmustalsobeinstalled.Thismaybeaprob-plicationofthisservertotheanalysisofDNAfinger-
lemforsomescientists,forexampleformanybiolo-printsbymultivariateanalysismethods,usingthe
gistswhoarenotfamiliarwithRandL
A
TEX,orwho
ade4
package(
Chesseletal.,2004
;
DrayandDu-
donotusethemonaregularbasis.
four,2007
).Wehaveshownrecently(
Valiente-Moro
Inthispaper,wedemonstrateanonlinerepro-
etal.,2009
)thatmultivariateanalysistechniquescan
ducibilitysystemthathelpscircumventtheseprob-beusedtoanalysebacterialDNAfingerprintsand
lems.Itisbasedonawebpagethatcanbeusedwiththattheymakeitpossibletodrawusefulconclu-
anywebbrowser.ItdoesnotrequireRtobeinstalledsionsaboutthecompositionofthebacterialcommu-
locally,nordoesitneedathoroughknowledgeofRnitiesfromwhichtheyoriginate.Wealsodemon-
andL
A
TEX.Neverthelessitallowstheusertopresentstratetheeffectivenessofprincipalcomponentanal-
TheRJournalVol.2/1,June2010
ISSN2073-4859
C
ONTRIBUTED
R
ESEARCH
A
RTICLES
ysis,between-groupanalysisandwithin-groupanal-
ysis[PCA,BGAandWGA,
Benzécri(1983
);
Dolédec
andChessel(1987
)]toshowdifferencesindiversity
betweenbacterialcommunitiesofvariousorigins.
Insummary,weshowherethatitiseasytoset
upasoftwareenvironmentofferingfullonlinerepro-
ducibilityofcomputationsandgraphicaldisplaysof
multivariatedataanalysismethods,evenforusers
whoarenotfamiliarwithR,SweaveorL
A
TEX.
Dataandmethods
Inthissection,wefirstdescribethebiologicalmate-
rialusedintheexampledataset.Thenwepresent
thestatisticalmethods(PCA,BGAandWGA),inthe
frameworkofthedualitydiagram(
Escoufier,1987
;
Holmes,2006
).Wealsodetailthesoftwareenviron-
mentthatwasused.
Biologicalmaterial
Thepoultryredmite,
Dermanyssusgallinae
isan
haematophagousmitefrequentlypresentinbreed-
ingfacilitiesandespeciallyinlayinghenfacilities
(
Chauve,1998
).Thisarthropodcanberesponsi-
bleforanemia,dermatitis,weightlossandade-
creaseineggproduction(
Kirkwood,1967
).Ithas
alsobeeninvolvedinthetransmissionofmany
pathogenicagentsresponsibleforseriousdiseasesin
bothanimalsandhumans(
ValienteMoroetal.,2005
;
Valiente-Moroetal.,2007
).Thepoultryredmiteis
thereforeanemergingproblemthatmustbestudied
tomaintaingoodconditionsincommercialeggpro-
ductionfacilities.Nothingisknownaboutitsassoci-
atednon-pathogenicbacterialcommunityandhow
thediversityofthemicroflorawithinmitesmayin-
fluencethetransmissionofpathogens.
Moststudiesoninsectmicrofloraarebased
onisolationandcultureoftheconstituentmicro-
organisms.However,comparisonofculture-based
andmolecularmethodsrevealsthatonly20-50%of
gutmicrobescanbedetectedbycultivation(
Suau
etal.,1999
).Molecularmethodshavebeendevel-
opedtoanalysethebacterialcommunityincom-
plexenvironments.Amongthesemethods,Dena-
turingGradientandTemporalTemperatureGelElec-
trophoresis(DGGEandTTGE)(
Muyzer,1998
)have
alreadybeenusedsuccessfully.
Fulldetailsoftheexampledatasetusedinthis
paper(mitessampling,DNAextraction,PCRampli-
ficationof16SrDNAfragmentsandTTGEbanding
patternachieving)aregivenin
Valiente-Moroetal.
(2009
).Briefly,13poultryfarmswereselectedinthe
BretagneregioninFrance,andineachfarm15sin-
glemites,fivepoolsof10mitesandonepoolof50
miteswerecollected.Theresultsforsinglemitesand
mitepoolswereanalysedseparately,butonlythere-
sultsofmitepoolsarepresentedinthisstudy,asthey
TheRJournalVol.2/1,June2010
54
hadthemostillustrativeanalysis.Bandingpatterns
canbeanalysedasquantitativevariables(intensity
ofbands),orasbinaryindicators(presence/absence
ofbands),butforthesamereason,onlythepres-
ence/absencedatawereusedinthispaper.
TTGEbandingpatternswerecollectedinadata
table,withbandsincolumns(55columns)andmite
poolsinrows(73rows).Thistablewasfirstsub-
jectedtoaprincipalcomponentanalysis(PCA)toget
anoverallideaofthestructureofthebandingpat-
terns.Between-groupanalysis(BGA)wasthenap-
pliedtostudythedifferencesbetweenpoultryfarms,
andfinallywithin-groupanalysis(WGA)wasused
toeliminatethefarmeffectandobtainthemainchar-
acteristicsofthecommonbacterialfloraassociated
with
D.gallinae
instandardbreedingconditions.A
goodknowledgeofthesecharacteristicswouldallow
comparisonsofthestandardbacterialfloraamong
varioussituations,suchasgeographicregions,type
andlocationofbreedingfacilities(particularlyor-
ganicfarms),ordevelopmentalstagesof
D.gallinae
.
Dualitydiagramofprincipalcomponent
analysis
Let
X
=[
x
ij
]
(
n
,
p
)
betheTTGEdatatablewith
n
rows
(individuals=mitepools)and
p
columns(variables
=TTGEbands).Variableshavemean
x
¯
j
=
n
1
∑
i
x
ij
andvariance
σ
j
2
=
n
1
∑
i
(
x
ij
−
x
¯
j
)
2
.Individualsbe-
longto
g
groups(orclasses),namely
G
1
,...,
G
g
,with
groupcounts
n
1
,...,
n
g
,and
∑
n
k
=
n
.
Usingdualitydiagramtheoryandtripletno-
tation,thePCAof
X
istheanalysisofatriplet
(
X
0
,
D
p
,
D
n
)
.
X
0
isthetableofstandardizedvalues:
X
0
=[
x
˜
ij
]
(
n
,
p
)
x
ij
−
x
¯
j
with
x
˜