La lecture à portée de main
Description
Sujets
Informations
Publié par | profil-zyak-2012 |
Nombre de lectures | 28 |
Langue | English |
Exrait
SubmittedtotheAnnalsofStatistics
arXiv:
math.PR/0000000
NEEDLESANDSTRAWSINAHAYSTACK:POSTERIOR
CONCENTRATIONFORPOSSIBLYSPARSESEQUENCES
∗ByIsmae¨lCastilloandAadvanderVaart
WeconsiderfullBayesianinferenceinthemultivariatenormal
meanmodelinthesituationthatthemeanvectorissparse.Theprior
distributiononthevectorofmeansisconstructedhierarchicallyby
firstchoosingacollectionofnonzeromeansandnextaprioronthe
nonzerovalues.Weconsidertheposteriordistributioninthefrequen-
tistset-upthattheobservationsaregeneratedaccordingtoafixed
meanvector,andareinterestedintheposteriordistributionofthe
numberofnonzerocomponentsandthecontractionoftheposterior
distributiontothetruemeanvector.Wefindvariouscombinations
ofpriorsonthenumberofnonzerocoefficientsandonthesecoeffi-
cientsthatgivedesirableperformance.Wealsofindpriorsthatgive
suboptimalconvergence,forinstanceGaussianpriorsonthenonzero
coefficients.Weillustratetheresultsbysimulations.
1.Introduction.
Supposethatweobserveavector
X
=(
X
1
,...,X
n
)
in
R
n
suchthat
(1.1)
X
i
=
θ
i
+
ε
i
,i
=1
,...,n,
forindependentstandardnormalrandomvariables
ε
i
andanunknownvector
ofmeans
θ
=(
θ
1
,...,θ
n
).WeareinterestedinBayesianinferenceon
θ
,in
thesituationthatthisvectorispossibly
sparse
.
Non-Bayesianapproachestothisproblemhaverecentlybeenconsidered
bymanyauthors.Golubev[13]obtainedresultsformodelselectionmethods
andthresholdestimatorsforthemean-squaredrisk.Birge´andMassart[4]
treatedthemodelwithintheirgeneralcontextofmodelselectionbypenal-
izedleastsquares.Abramovichetal.in[1]studiedtheperformanceofthe
FalseDiscoveryRatemethod.TheearlierworkbyDonohoandJohnstone
[10]canbeviewedasstudyingtheproblemwithinan
`
r
context.Manyau-
thors(seee.g.[3],[22],[21]andreferencescitedthere)haveinvestigatedthe
connectiontotheLASSOorsimilarmethods.
MethodswithaBayesianconnectionwerestudiedbyGeorgeandFoster
[12],Zhang[20],JohnstoneandSilverman[16,17],Abramovich,Grinshtein
∗
WorkpartlysupportedbyaPostdoctoralfellowshipfromtheVUUniversityAmster-
madAMS2000subjectclassifications:
Primary62G05,62G20
Keywordsandphrases:
Bayesianestimators,Sparsity,Gaussiansequencemodel,Mix-
turepriors,Asymptotics,Contraction.
1imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
2
I.CASTILLOANDA.W.VANDERVAART
andPensky[2],andJiangandZhang[15].Thepapers[12]and[16]con-
sideredanempiricalBayesmethod,consistingofmodellingtheparameters
θ
1
,...,θ
n
a-prioriasindependentlydrawnfromamixtureofaDiracmea-
sureat0andacontinuousdistribution,determininganappropriatemixing
weightbythemethodof(restricted)marginalmaximumlikelihood,and
finallyemployingtheposteriormedianormean.Thesecondpaper[2]moti-
vatedpenalties,appliedinapenalizedminimumcontrastscheme,byprior
distributionsontheparameters,andderivedestimatorsforthenumberof
nonzero
θ
i
andthe
θ
i
itself.Thefirstisaposteriormode,buttheestimator
for
θ
,called“Bayesiantestimation”,doesnotseemitselfBayesian.(Infact,
theGaussianpriorforthenon-zeroparametersin[2]willbeseentoperform
suboptimallyinourfullyBayesianset-up.)Thepapers[20]and[15]obtain
sharpresultson(nonparametric)empiricalBayesestimators.
Otherrelatedpapersinclude[19],[6],[7],[14],[15],[5].
Apenalizedminimumcontrastestimatorcanoftenbeviewedasthemode
oftheposteriordistribution,anditishelpfultointerpretepenaltiesaccord-
ingly.However,theBayesianapproachyieldsafullposteriordistribution,
whichisarandomprobabilitydistributionontheparameterspace.Ithas
bothalocationandaspread,andcanbemarginalizedtogiveposterior
distributionsforanyfunctionsoftheparametervectorofinterest.Itisthis
objectthatwestudyinthispaper.SuchfullBayesianinferencewasrecently
consideredbyScottandBerger[18],whodiscussedvariousaspectsnotcov-
eredinthepresentpaper,butnoconcentrationresults.Oneexampleofour
resultsisthatthebeta-binomialpriorsin[18],combinedwithmoderately
toheavytailedpriorsonthenonzeromeans,yieldoptimalrecovery.
Sparsity
canbedefinedinvariousways.Perhapsthemostnaturaldefini-
tionistheclassof
nearlyblack
vectors,definedas
`
0
[
p
n
]=
{
θ
∈
R
n
:#(1
≤
i
≤
n
:
θ
i
6
=0)
≤
p
n
}
.
Here
p
n
isagivennumber,whichintheoreticalinvestigationsistypically
assumedtobe
o
(
n
),as
n
→∞
.Sparsitymayalsomeanthatmanymeans
aresmall,butpossiblynotexactlyzero.Definitionsthatmakethisprecise
use
strong
or
weak
`
s
-balls
,typicallyfor
s
∈
(0
,
2).Thesearedefinedas,with
θ
[1]
≥
θ
[2]
≥∙∙∙≥
θ
[
n
]
thenonincreasingpermutationofthecoordinatesof
θ
=(
θ
1
,...,θ
n
),
nX
n
o
`
s
[
p
n
]=
θ
∈
R
n
:1
|
θ
i
|
s
≤
p
ns
nn1=in
n
1
s
p
n
s
o
m
s
[
p
n
]=
θ
∈
R
:
n
1
≤
m
i
a
≤
x
n
i
|
θ
[
i
]
|≤
n.
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
SPARSITYANDBAYESPOSTERIORMEASURE
3
Becausethenonzerocoefficientsin
`
0
[
p
n
]arenotquantitativelyrestricted,
thereisnoinclusionrelationshipbetweenthisspaceandtheweakandstrong
balls,althoughresultsforthelattercanbeobtainedbyprojectingthem
into
`
0
[
p
n
].Ontheotherhand,forany
s>
0wehavetheinclusion
`
s
[
p
n
]
⊂
m
s
[
p
n
].
Theextentofthesparsity,measuredbytheconstant
p
n
,isassumedun-
known.OurBayesianapproachstartsbyputtingaprior
π
n
onthisnumber,
agivenprobabilitymeasureontheset
{
0
,
1
,
2
,...,n
}
.Nextwecomplete
thistoaprioronthesetofallpossiblesequences
θ
=(
θ
1
,...,θ
n
)in
R
n
,
bygivenadraw
p
from
π
n
choosingarandomsubset
S
⊂{
1
,...,n
}
of
cardinality
p
,andchoosingthecorrespondingcoordinates(
θ
i
:
i
∈
S
)from
adensity
g
S
on
R
S
andsettingtheremainingcoordinates(
θ
i
:
i
∈
S
c
)equal
tozero.Giventhisprior,Bayes’ruleyieldstheposteriordistributionof
θ
asusual.Weinvestigatethepropertiesofthisposteriordistribution,inits
dependenceonthepriorsonthedimensionandonthenonzerocoefficients,
inthenonBayesianset-upwhere
X
follows(1.1)with
θ
equaltoafixed,
“true”parameter
θ
0
.
Ifthetrueparametervector
θ
0
belongsto
`
0
[
p
n
],thenitisdesirablethat
theposteriordistributionconcentratesmostofitsmassonnearlyblack
vectors.Onemainresultofthepaperisthatthisisthecaseprovidedthe
priorprobabilities
π
n
{
p
}
decreaseexponentiallyfastwiththedimension
p
.
Thequalityofthereconstructionofthefullvector
θ
canbemeasuredby
variousdistances.AnaturaloneistheEuclideandistance,withsquare
nXk
θ
−
θ
0
k
2
=(
θ
i
−
θ
i
0
)
2
.
1=iIftheindicesofthe
p
n
nonzerocoordinatesofavectorinthemodel
`
0
[
p
n
]
wereknowna-priori,thenthevectorcouldbeestimatedwithmeansquare
erroroftheorder
p
n
.In[11]itisshownthat,as
n,p
n
→∞
with
p
n
=
o
(
n
),
2in
ˆ
fsup
P
n,θ
k
θ
ˆ
−
θ
k
=2
p
n
log(
n/p
n
)1+
o
(1)
.
θθ
∈
`
0
[
p
n
]
Heretheinfimumistakenoverallestimators
θ
ˆ=
θ
ˆ(
X
)and
P
n,θ
denotes
takingtheexpectationundertheassumptionthat
X
is
N
n
(
θ,I
)-distributed.
Inotherwords,thesquareminimaxrateover
`
0
[
p
n
]is
p
n
log(
n/p
n
),meaning
thattheunknownidentityofthenonzeromeansneedstoleadonlytoa
logarithmicloss.
TheBayesianapproachispresumablyadoptedfortheintuitionprovided
bypriormodelling,andisnotnecessarilydirectedatattainingminimax
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
4
I.CASTILLOANDA.W.VANDERVAART
rates.However,fortheoreticalinvestigationitisnaturaltotakethemin-
imaxrateasabenchmark,anditisofparticularinteresttoknowwhich
priorsyieldaposteriordistributionthatconcentratesmostofitsmasson
ballsaround
θ
0
ofsquareradiusoforder
p
n
log(
p
n
/n
),orcloserelativesas
p
n
(log
n
)
r
thatloose(only)alogarithmicfactor.Asecondmainresultof
thepaperisthattheminimaxrateisattainedformanycombinationsof
priors.Itsufficesthatthepriors
π
n
decreaseexponentiallywithdimension,
andgivesufficientweighttothetruelevelofsparsity:forsome
c>
0,
(1.2)
π
n
(
p
n
)
&
exp
−
cp
n
log(
n/p
n
)
.
Furthermore,thepriorsonthenonzerocoordinatesshouldhavetailsthat
arenotlighterthanLaplace,andsatisfyanumberofothertechnicalprop-
erties.Ifinequality(1.2)failsthentherateofcontr
actionm
aybeslower
thanminimax;weshowthatitisnotslowerthanlog1
/π
n
(
p
n
).(Theword
“contraction”isinlinewithotherliteratureonnonparametricBayesianpro-
cedures;withthepresentchoiceofmetrics(whichgrowwith
n
)therates
actuallyincreasetoinfinity.)
Moregenerally,weconsiderreconstructionrelativetothe
`
q
metricfor
0
<q
≤
2,defined(without
q
th-root)by
nX(1.3)
d
q
(
θ,θ
0
)=
|
θ
i
−
θ
0
i
|
q
.
1=iFor
q<
2this“metric”ismoresensitivetosmallvariationsinthecoordi-
natesthanthesquareEuclideanmetric,whichis
d
2
.(For
q
≤
1thedefinition
givesatruemetric
d
q
;for1
<q
≤
2itdoesnot.)From[11]theminimax
rateover
`
0
[
p
n
]for
d
q
isknowntobeoftheorder
(1.4)
r
n
∗
,q
=
p
n
log
q/
2
(
n/p
n
)
.
Weshowthattheposterior“contraction”rateattainsthisorderundercon-
ditionsasintheprecedingparagraph,andmoregenerallycharacterizethe
rateintermsoflog(1
/π
n
(
p
n
)).
Besidesnearlyblackvectorsweconsiderratesofcontractionif
θ
0
isina
weak
`
s
-ball.Theminimaxrateover
m
s
[
p
n
]relativeto
d
q
isgivenby(see
)]01[sp(1.5)
µ
n
∗
,s,q
=
n
n
log
(
q
−
s
)
/
2
(
n/p
n
)
.
nThisisshowntobealsotherateofposteriorcontractionunderslightly
strongerconditionsonthepriorsthanbefore:thepriorondimensionmust
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
SPARSITYANDBAYESPOSTERIORMEASURE
5
decreaseslightlyfasterthanexponential.Underthesameconditionswe
alsoshowthattheposteriordistributionhasexponentialconcentration,and
thereforecontractsalsointhestrongersenseof(any,Euclidean)moments.
Asummaryoftheseresultsisthatgoodpriorsforthedimensiondecrease
atexponentialor,perhapsbetter,slightlyfasterrate,andgoodpriorson
thenonzeromeanshavetailsthatareheavierthanLaplace.Wealsoshow
thatpriorswithlightertails,suchastheGaussian,attainsignificantlylower
contractionratesattrueparametervectors
θ
0
thatarenotclosetothe
origin.
Thestructureofthearticleisasfollows.InSection2westatethemain
concentrationresults.Apracticalalgorithm,simulationsandsomepictures
arepresentedinSection3.Proofsaregatheredattheendofthepaperand
inthesupplementaryAppendix[9].
1.1.
Notation.
Wedenoteby
a
∧
b
and
a
∨
b
theminimumandmaximum
oftworealnumbers
a,b
,andwrite
a
.
b
if
a
≤
Cb
forauniversalconstant
C
.Thenotation
,
means“equalbydefinitionto”.Wecall
support
ofa
vector
θ
=(
θ
1
,...,θ
n
)
∈
R
n
thesetofindicesofnonzerocoordinates,and
denotethisby
S
θ
=
{
i
∈{
1
,...,n
}
:
θ
i
6
=0
}
.Weset
θ
S
=(
θ
i
:
i
∈
S
),and
let
|
S
|
bethecardinalityofaset
S
⊂{
1
,...,n
}
.
2.Mainresults.
ThroughoutthepaperweconsiderapriorΠ
n
on
R
n
constructedinthreesteps:
(P1)A
dimension
p
ischosenaccordingtoapriorprobabilitymeasure
π
n
ontheset
{
0
,
1
,
2
,...,n
}
.
(P2)Given
p
asubset
S
⊂{
1
,...,n
}
ofsize
|
S
|
=
p
ischosenuniformlyat
nrandomfromthe
p
subsetsofsize
p
.
(P3)Given(
p,S
)avector
θ
S
=(
θ
i
:
i
∈
S
)ischosenfromaprobability
distributionwithLebesguedensity
g
S
on
R
p
(if
p
≥
1)andthisis
extendedto
θ
∈
R
n
bysettingtheremainingcoordinates
θ
S
c
equalto
.0Forsimplicityweusethesamedensity
g
S
foreverysetofagivendimension
|
S
|
,andwilldenotethisalsoby
g
|
S
|
.
GiventhepriorΠ
n
,Bayes’ruleyieldsthe
posteriordistribution
B
7→
Π
n
(
B
|
X
),theconditionaldistributionof
θ
given
X
iftheconditionaldistri-
butionof
X
given
θ
istakenequaltothenormaldistribution
N
n
(
θ,I
).The
probabilityΠ
n
(
B
|
X
)ofaBorelset
B
⊂
R
n
undertheposteriordistribution
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
6
I.CASTILLOANDA.W.VANDERVAART
canbewritten
X
n
π
(
p
)
XZYY
nn
φ
(
X
i
−
θ
i
)
φ
(
X
i
)
g
S
(
θ
S
)
dθ
S
p
=0
p
|
S
|
=
p
(
θ
S
,
0)
∈
Bi
∈
Si
∈
/S
(2.1)
X
n
π
(
p
)
XZYY
.
nn
φ
(
X
i
−
θ
i
)
φ
(
X
i
)
g
S
(
θ
S
)
dθ
S
p
=0
p
|
S
|
=
pi
∈
Si
∈
/S
Here(
θ
S
,
0)isthevectorin
R
n
formedbyaddingcoordinates
θ
i
=0to
θ
S
=(
θ
i
:
i
∈
S
),atthepositionsleftopenby
S
⊂{
1
,...,n
}
(inthecorrect
orderofthecoordinatesandnotattheend,asthenotationsuggests).This
expressionissomewhatunwieldy;weconsidercomputationinSection3.
Theposteriordistributioncanbemarginalizedtoobtaintheposteriordis-
tributionofanarbitrarytransformation
f
(
θ
)bychoosingtheset
B
equal
to
B
=
{
θ
:
f
(
θ
)
∈
B
0
}
.
Theposteriordistributionisarandomprobabilitydistributionon
R
n
,
whichwestudyundertheassumptionthatthevector
X
=(
X
1
,...,X
n
)is
distributedaccordingtoamultivariatenormaldistributionwithmeanvector
θ
0
andcovariancematrixtheidentity.Welet
P
n,θ
0
T
denotetheexpected
valueofafunction
T
=
T
(
X
)underthisdistribution.
Weshallbeinterestedintwoaspectsoftheposteriordistribution:its
dimensionalityanditsabilitytorecoverthemeanvector
θ
.Becausethe
conditionsaresimplerinthecasethatthenonzerocoordinatesareindepen-
dentundertheprior,inthefirsttworesultsweassumethatthedensities
g
S
in(P3)areofproductform.Concreteexamplesofpriorsasin(P1)and(P3)
thatsatisfytheconditionsimposedinthetheoremsaregiveninSection2.5.
2.1.
Dimensionality.
Inthecontextof
`
0
[
p
n
]-classes,wesaythatthe
prior
π
n
ondimensionhas
exponentialdecrease
if,forsomeconstants
C>
0
and
D<
1,
(2.2)
π
n
(
p
)
≤
Dπ
n
(
p
−
1)
,p>Cp
n
.
Theorem
2.1(Dimension)
.
If
π
n
hasexponentialdecrease
(2.2)
and
g
S
isaproductof
|
S
|
copiesofaunivariatedensitywithmeanzeroandfinite
secondmoment,thenthereexists
M
suchthat,as
p
n
,n
→∞
,
sup
P
n,θ
0
Π
n
θ
:
|
S
θ
|
>Mp
n
|
X
→
0
.
θ
0
∈
`
0
[
p
n
]
Forreasonablepriors,wemayhopethattheposteriordistributionspreads
massinthe
p
n
-dimensionalsubspacethatsupportsatruemeanvector
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
SPARSITYANDBAYESPOSTERIORMEASURE
7
θ
0
∈
`
0
[
p
n
].Thetheoremshowsthattheposteriordistribution“overshoots”
thisspacebysubspacesofdimensionatmostamultipleof
p
n
.Becausethe
overshootcanhavearandomdirection,thisdoesnotmeanthattheposte-
riordistributionconcentratesoverallonafixed
Mp
n
-dimensionalsubspace.
Thetheoremshowsthatitconcentratesalong
Mp
n
-dimensionalcoordinate
planes,butitssupportwillbefarfromconvex.
Obviouslytheposteriordistributionwillconcentrateonlow-dimensional
subspacesifthehigherdimensionalspacesreceivelittlemassundertheprior
π
n
.Bythetheoremexponentialdecreaseissufficient.Thenextstepisto
showthatexponentialdecreaseisnottooharsh:itiscompatiblewithgood
reconstructionofthefullmeanvector
θ
.Thisthenofcourserequiresalower
boundonthepriormassgiventothespacesof“correct”dimension;for
instance(1.2).
2.2.
Recovery.
Goodrecoveryrequiresalsoappropriatepriordensities
g
S
onthenonzerocoordinates.Becausethestatisticalproblemofrecovering
θ
froma
N
p
(
θ,I
)distributedobservationisequivariantin
θ
,wemayhope
thatthelocationofthenonzerocoordinatesof
θ
0
doesnotplayaroleinits
recoveryrate.Thenon-Bayesianproceduresconsideredinforinstance[13]
indeedfulfillthisexpectation.However,aBayesianprocedure(withproper
priors)necessarilyfavourscertainregionsoftheparameterspace.Depending
onthechoiceofpriors
g
S
in(P3)thismayleadtoashrinkageeffect,even
inthe“average”recoveryoftheparameteras
n
→∞
,yieldingsuboptimal
behaviourfortrueparameters
θ
0
thatarefarfromtheorigin.Thisshrinkage
effectcanbepreventedbychoosingpriors
g
S
withsufficientlyheavytails.
Againwefirstconsiderthecaseofindependentcoordinates.Inthefol-
lowingtheoremweassumethat
g
S
isaproductof
|
S
|
densitiesoftheform
e
h
,forafunction
h
:
R
→
R
satisfying
(2.3)
h
(
x
)
−
h
(
y
)
Un accès à la bibliothèque YouScribe est nécessaire pour lire intégralement cet ouvrage.
Découvrez nos offres adaptées à tous les besoins !