False discovery rate and asymptotics [Elektronische Ressource] / vorgelegt von Thorsten-Ingo Dickhaus

Publié par

FalseDiscoveryRateandAsymptoticsInaugural-DissertationzurErlangungdesDoktorgradesderMathematisch NaturwissenschaftlichenFakultätderHeinrich Heine UniversitätDüsseldorfvorgelegtvonThorsten IngoDickhausausBerlin KreuzbergJanuar2008AusdemInstitutfürBiometrieundEpidemiologiedesDeutschenDiabetes Zentrums,Leibniz InstitutanderHeinrich Heine UniversitätDüsseldorfGedrucktmitderGenehmigungderMathematisch NaturwissenschaftlichenFakultätderHeinrich Heine UniversitätDüsseldorfReferent: Prof. Dr. ArnoldJanssenKorreferent: PDDr. HelmutFinnerTagdermündlichenPrüfung: 15. Januar2008ContentsOverview 11 Introduction 31.1 MultipletestingandFalseDiscoveryRate . . . . . . . . . . . . . . . . . . . . . 31.2 Theconceptofp values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.1 p valueadjustmentformultiplicity . . . . . . . . . . . . . . . . . . . . . 92 FDRcontrolwithSimes’criticalvalues 102.1 Generaltheoreticalframeworkintheexchangeablesetup . . . . . . . . . . . . . 152.1.1 Twomodelswithexchangeableteststatistics . . . . . . . . . . . . . . . 162.1.2 LargestcrossingpointsandcomputationofEERandFDR . . . . . . . . 182.1.3 AllLCPsgreaterthanzero . . . . . . . . . . . . . . . . . . . . . . . . . 192.1.4 SomeLCPsequaltozero . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Exchangeableexponentiallydistributedvariables . . . . . . . . . . . . . . . . . 242.3normallydistributedvariables . . . . . . . . . . . . . . . . . . . . 282.3.
Publié le : mardi 1 janvier 2008
Lecture(s) : 15
Tags :
Source : DOCSERV.UNI-DUESSELDORF.DE/SERVLETS/DERIVATESERVLET/DERIVATE-6873/DISSERTATION_A1B.PDF
Nombre de pages : 144
Voir plus Voir moins

FalseDiscoveryRateandAsymptotics
Inaugural-Dissertation
zur
ErlangungdesDoktorgradesder
Mathematisch NaturwissenschaftlichenFakultät
derHeinrich Heine UniversitätDüsseldorf
vorgelegtvon
Thorsten IngoDickhaus
ausBerlin Kreuzberg
Januar2008AusdemInstitutfürBiometrieundEpidemiologiedes
DeutschenDiabetes Zentrums,Leibniz Institutander
Heinrich Heine UniversitätDüsseldorf
GedrucktmitderGenehmigungder
Mathematisch NaturwissenschaftlichenFakultätder
Heinrich Heine UniversitätDüsseldorf
Referent: Prof. Dr. ArnoldJanssen
Korreferent: PDDr. HelmutFinner
TagdermündlichenPrüfung: 15. Januar2008Contents
Overview 1
1 Introduction 3
1.1 MultipletestingandFalseDiscoveryRate . . . . . . . . . . . . . . . . . . . . . 3
1.2 Theconceptofp values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 p valueadjustmentformultiplicity . . . . . . . . . . . . . . . . . . . . . 9
2 FDRcontrolwithSimes’criticalvalues 10
2.1 Generaltheoreticalframeworkintheexchangeablesetup . . . . . . . . . . . . . 15
2.1.1 Twomodelswithexchangeableteststatistics . . . . . . . . . . . . . . . 16
2.1.2 LargestcrossingpointsandcomputationofEERandFDR . . . . . . . . 18
2.1.3 AllLCPsgreaterthanzero . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 SomeLCPsequaltozero . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Exchangeableexponentiallydistributedvariables . . . . . . . . . . . . . . . . . 24
2.3normallydistributedvariables . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Thespecialcaseζ = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Thegeneralcaseζ < 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Exchangeablestudentizednormalvariables . . . . . . . . . . . . . . . . . . . . 41
2.4.1 Thespecialcaseν = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.2 Thegeneralcaseν > 1andζ < 1 . . . . . . . . . . . . . . . . . . . . . 45
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3 Anewrejectioncurve 52
3.1 Notationandpreliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Motivationandheuristicderivation . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Proceduresbasedonthenewrejectioncurve . . . . . . . . . . . . . . . . . . . . 55
3.4 LFCresultsandupperFDRbounds . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5 AsymptoticFDRcontrolforproceduresbasedontheAORC . . . . . . . . . . . 64
3.6optimalityoftheAORC . . . . . . . . . . . . . . . . . . . . . . . . 70
3.7 FDRcontrolforafixednumberofhypotheses . . . . . . . . . . . . . . . . . . . 74
i3.7.1 Simultaneousβ adjustment . . . . . . . . . . . . . . . . . . . . . . . . 77
3.7.2 Multivariateoptimizationproblem . . . . . . . . . . . . . . . . . . . . . 78
3.8 ConnectiontoStorey’sapproach . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 PowerstudyforsomeFDR controllingtestprocedures 85
4.1 Simplehypothesescase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 Compositehypothesescase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5 Concludingremarksandoutlook 93
A Numericalsimulationsandcalculations 95
A.1 SimulationsforFDRunderdependency . . . . . . . . . . . . . . . . . . . . . . 95
A.2 AdjustedproceduresbasedontheAORC . . . . . . . . . . . . . . . . . . . . . 97
A.2.1 SUD procedure,Example3.5 . . . . . . . . . . . . . . . . . . . . . . . 97
(1)
A.2.2 SU procedurebasedon f ,Example3.7 . . . . . . . . . . . . . . . 980.05,κ1
(2)
A.2.3basedon f ,3.7 . . . . . . . . . . . . . . . 99
0.05,κ2
A.2.4 SU procedurewithtruncatedcurve,Example3.8 . . . . . . . . . . . . . 99
B Conceptsofpositivedependency 101
ListofTables 107
ListofFigures 108
Bibliography 109
iiListofAbbreviationsandSymbols
AORC AsymptoticallyOptimalRejectionCurve
B(p,q) Betafunction,B(p,q) = Γ(p)Γ(q)/Γ(p+q)
BP BoundaryPoint
dxe Smallestintegerlargerthanorequaltox
2χ Chi squaredistributionwith ν degreesoffreedomν
{M ComplementofthesetM
CP CrossingPoint
cdf. Cumulativedistributionfunction
δ Kroneckersymboli,j
ecdf. Empiricalcumulativedistributionfunction
ε Diracmeasureinpointaa
d= Equalityindistribution
EER ExpectedErrorRate
F Cumulativedistributionfunctionofareal valuedX
randomvariableX
FDR FalseDiscoveryRate
FWER FamilyWiseErrorRate
bxc Largestintegerlowerthanorequaltox
R∞ x−1 −tΓ(·) Gammafunction,Γ(x) = t e dt, x> 0
0
im(X) ImageoftherandomentityX
iiii.i.d. independentandidenticallydistributed
1 IndicatorfunctionofsetMM
LCP LargestCrossingPoint
L(X) LawofdistributionofrandomvariableX
LFC LeastFavorableConfiguration
λ Lebesguemeasure
MTP Multivariatetotalpositivityoforder22
N {1,...,n}n
2 2N(μ,σ ) Normaldistributionwithparametersμandσ
Φ CumulativedistributionfunctionoftheN(0,1)distribution
ϕ(·) ProbabilitydensityfunctionoftheN(0,1)distribution
PRD Positiveregressiondependency
PRDS Positiveregressiondependencyonsubsets
pdf. Probabilitydensityfunction
pmf. Probabilitymassfunction
SD Step down
SU Step up
SUD Step up down
UNI[a,b] Uniformdistributionontheinterval[a,b]
ivOverview
TheFalseDiscoveryRate(FDR)isaratheryoungparadigmincontrollingerrorsofamultipletest
procedure. Especially in the context of genetics and microarray analyses, the FDR has become
a very popular error control criterion over the last decade, because it is less restrictive than the
classicalFamilyWiseErrorRate(FWER).Thisisespeciallyimportantsinceinseveraloftoday’s
application fieldslike genome wideassociation (GWA)studies, sometimesten thousandsor even
somehundredthousandsofhypotheseshavetobetestedsimultaneouslyandtheanalyses(atleast
at a first stage) have mainly explorative character so that in this stage of the analysis one is of
ten more interested in getting some significances than in avoiding a few false ones. Instead of
controlling the probability of making at least one false rejection, the FDR controls the expected
proportion offalselyrejected(true)nullhypothesesamongallrejections. Duetothemassivemul
tiplicity of some of the current applications, asymptotic considerations become more and more
relevant. Therefore, in this work special focus will be laid on the asymptotic behaviour of the
False Discovery Rate with the numbern of hypotheses tending to infinity. Other applications in
cludeastronomy(cf.,e. g.,[176])andproteomics,cf. Application2.4.
The remainder of this work is organized as follows. In Chapter 1, some theoretical foundations
will be presented, including a formal definition of the FDR. Most of the results in that chapter
are already known so that it has a repetitious character. Furthermore, some notational aspects are
covered.
Chapter 2 then deals with a popular FDR controlling multiple test procedure, namely the linear
step up procedure based on Simes’ critical values introduced in the pioneering article by Ben
jamini and Hochberg from 1995, see [13]. Since it is well known that this method controls the
FDR for positively dependent test statistics being at hand, we study its asymptotic conservative
nessinsomespecialdistributionalsituations.
In Chapter 3 we present and investigate a new rejection curve designed to asymptotically exhaust
thewholeFDRlevelαundersomeextremeparameterconfigurations.
1Besides these theoretical considerations, we will apply some of the test procedures presented in
Chapters2and3toreallifedataandinvestigateFDR"atwork".
Chapter 4 contains a systematic (numerical) comparison of some recently developed test proce
dures which aim at improving the linear step up procedure. Under various distributional settings,
we investigate their behaviour with respect to type I error and power. This allows us to discuss
assetsanddrawbacksofeachoftheconsideredprocedures.
In Chapter 5, finally, our results will be summarized and we give an outlook on some pursuing
issues.
Some numerical computations and computer simulations referring to the theoretical results in
Chapters 2 and 3 are presented in the Appendix. Moreover, we briefly discuss some notions of
positivedependencythere.
The research that has lead to this work has been part of the first period of a research project
sponsored by the Deutsche Forschungsgemeinschaft (DFG), grant No. FI 524/3 1, under the re
sponsibilityofmyadvisorHelmutFinnerandofProf.GuidoGiani. Intheapplicationtothisgrant,
the aims of Chapters 2 and 3 have already been formulated and parts of the elaborations in these
chapters are joint work with Helmut Finner and Markus Roters as well. Main results of Chapter
2arepre publishedin[86]and[88]. AnarticlecontainingthemainresultsofChapter3hasbeen
acceptedforpublication,see[87]. IamgratefultotheDFGforfinancingmytenureattheGerman
Diabetes Center from July 2005 to April 2007 and to Helmut Finner for providing me with the
interestingtopicsandforsomevaluablepreliminarynotesfromhistreasurechest.
2Chapter1
Introduction
1.1 MultipletestingandFalseDiscoveryRate
The goal of multiple testing consists of testingn > 1 hypotheses simultaneously and controlling
some kind of overall error rate. The most conservative and highly intuitive method is
theFamilyWiseErrorRate(FWER)inthestrongsense. TheFamilyWiseErrorisdefinedasthe
eventthatatleastonefalserejectionamongthenindividualtestsisperformedandtheFWER(in
the strong sense) for a multiple test procedureϕ = (ϕ ,...,ϕ ) is the probability for the latter1 n
eventanditcanthereforelooselybydefinedas
FWER (ϕ) =P(∃1≤i≤n :{ϕ = 1andH istrue}). (1.1)n i i
There also exists a definition of the FWER in the weak sense aiming at error control under the
globalhypothesisthatallnnullhypothesesaretrue. However,weonlyconsidertheFWERinthe
strong sense here. A rather simple and naive method for controlling the FWER is the Bonferroni
procedure, where each individual testϕ is carried out at levelα = α/n. Due to subadditivity,i i
weimmediatelygettheFWER controllingpropertyoftheBonferronimethod,becauseof
n
X
FWER (ϕ)≤ α ,n k
k=1
withα denoting the individual level forϕ . The disadvantage of the Bonferroni method is thatk k
theseindividuallevelsbecomeextremelysmallforalargenumberofhypothesesnathandwhich
resultsinaverylowpoweroftheBonferronimethodforlargen. Therefore,manyimprovements
of the Bonferroni method have been developed. The maybe most advanced method towards con
structingamultiplelevelα testconsistsintheso called partitioningprinciple developedbyFinner
andStraßburger,see[94].
It shall be mentioned here that a multiple test procedure ϕ = (ϕ ,...,ϕ ) which controls the1 n
34 1.1. MULTIPLETESTINGANDFALSEDISCOVERYRATE
FWERatapre specifiedlevel αcanalsobeusedtoperformalevelα testfortheglobalintersec
Tn
tion hypothesisH = H (assuming thatH is not empty). We simply rejectH iff there0 i 0 0i=1
existsanindex1≤k≤nwithϕ = 1. ThetypeIerrorcontrollingpropertyofthistestmethodisk
immediate if we keep in mind thatϕ has the property that the right hand side of (1.1) is bounded
byα. IfthetestϕisconstructedaccordingtotheBonferronimethod,thecorrespondingintersec
tion hypothesis test ψ (say) simply becomes ψ = 1 , where p denotes the smallest1:n{p ≤α/n}1:n
p value, cf. Section 1.2. One improvement with respect to power has been developed by Simes,
cf. [264], for independentp values. We mention it here because its critical values will be used in
adifferentcontextlater. Simes’methodisdescribedinAlgorithm2.1atthebeginningofChapter
2.
A more radical approach towards gaining of power in a multiple testing problem is relaxation of
the underlying error measure. Especially for large values ofn, controlling the FWER may be a
much too conservative goal, especially if we consider a screening experiment where it is more
important to get some significances than to avoid a few false ones. A more liberal and nowadays
widely used error measure in the latter situation is the False Discovery Rate (FDR). In contrast
to the FWER, not the probability of performing at least one false rejection is controlled, but the
expectedproportion offalselyrejectedhypotheseswithregardtoallrejectedhypotheses. Inorder
toformalizethistask,weneedsomenotation.
Definition1.1
Let (Ω,A,{P : ϑ ∈ Θ}) denote a statistical experiment andN = {1,...,n} ⊂N. Letϕ =ϑ n
(ϕ ,...,ϕ ) be a multiple test procedure for the family (H ,...,H ) of non empty hypotheses1 n 1 n
with H ⊂ Θ for all i ∈N . A hypothesis H ,k ∈N , is called true if ϑ ∈ H and falsei n k n k
otherwise. Thenwedefine
R (ϕ) = |{i∈N :ϕ = 1}|, (1.2)n n i
V (ϕ) = |{i∈N :ϕ = 1 and H is true}|, (1.3)n n i i

V (ϕ)n
FDR (ϕ) = E , (1.4)n ϑ
R (ϕ)∨1n
andsaythatϕ controlstheFDRatapre chosen level of significance α∈ (0,1)iff
supFDR (ϕ)≤α.n
ϑ∈Θ
TheratioV (ϕ)/[R (ϕ)∨1]iscalled the false discovery proportion (FDP).n n
If it is clear which procedureϕ is investigated, the argumentϕ is often dropped and we simply
writeV =V (ϕ)andR =R (ϕ). ThemeaningofthequantitiesV andR isillustratedinthen n n n n n
followingtable.
FalseDiscoveryRateandAsymptotics,ThorstenDickhaus

Soyez le premier à déposer un commentaire !

17/1000 caractères maximum.