La lecture à portée de main
Découvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDécouvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDescription
Sujets
Informations
Publié par | profil-zyak-2012 |
Nombre de lectures | 28 |
Langue | English |
Extrait
SubmittedtotheAnnalsofStatistics
arXiv:
math.PR/0000000
NEEDLESANDSTRAWSINAHAYSTACK:POSTERIOR
CONCENTRATIONFORPOSSIBLYSPARSESEQUENCES
∗ByIsmae¨lCastilloandAadvanderVaart
WeconsiderfullBayesianinferenceinthemultivariatenormal
meanmodelinthesituationthatthemeanvectorissparse.Theprior
distributiononthevectorofmeansisconstructedhierarchicallyby
firstchoosingacollectionofnonzeromeansandnextaprioronthe
nonzerovalues.Weconsidertheposteriordistributioninthefrequen-
tistset-upthattheobservationsaregeneratedaccordingtoafixed
meanvector,andareinterestedintheposteriordistributionofthe
numberofnonzerocomponentsandthecontractionoftheposterior
distributiontothetruemeanvector.Wefindvariouscombinations
ofpriorsonthenumberofnonzerocoefficientsandonthesecoeffi-
cientsthatgivedesirableperformance.Wealsofindpriorsthatgive
suboptimalconvergence,forinstanceGaussianpriorsonthenonzero
coefficients.Weillustratetheresultsbysimulations.
1.Introduction.
Supposethatweobserveavector
X
=(
X
1
,...,X
n
)
in
R
n
suchthat
(1.1)
X
i
=
θ
i
+
ε
i
,i
=1
,...,n,
forindependentstandardnormalrandomvariables
ε
i
andanunknownvector
ofmeans
θ
=(
θ
1
,...,θ
n
).WeareinterestedinBayesianinferenceon
θ
,in
thesituationthatthisvectorispossibly
sparse
.
Non-Bayesianapproachestothisproblemhaverecentlybeenconsidered
bymanyauthors.Golubev[13]obtainedresultsformodelselectionmethods
andthresholdestimatorsforthemean-squaredrisk.Birge´andMassart[4]
treatedthemodelwithintheirgeneralcontextofmodelselectionbypenal-
izedleastsquares.Abramovichetal.in[1]studiedtheperformanceofthe
FalseDiscoveryRatemethod.TheearlierworkbyDonohoandJohnstone
[10]canbeviewedasstudyingtheproblemwithinan
`
r
context.Manyau-
thors(seee.g.[3],[22],[21]andreferencescitedthere)haveinvestigatedthe
connectiontotheLASSOorsimilarmethods.
MethodswithaBayesianconnectionwerestudiedbyGeorgeandFoster
[12],Zhang[20],JohnstoneandSilverman[16,17],Abramovich,Grinshtein
∗
WorkpartlysupportedbyaPostdoctoralfellowshipfromtheVUUniversityAmster-
madAMS2000subjectclassifications:
Primary62G05,62G20
Keywordsandphrases:
Bayesianestimators,Sparsity,Gaussiansequencemodel,Mix-
turepriors,Asymptotics,Contraction.
1imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
2
I.CASTILLOANDA.W.VANDERVAART
andPensky[2],andJiangandZhang[15].Thepapers[12]and[16]con-
sideredanempiricalBayesmethod,consistingofmodellingtheparameters
θ
1
,...,θ
n
a-prioriasindependentlydrawnfromamixtureofaDiracmea-
sureat0andacontinuousdistribution,determininganappropriatemixing
weightbythemethodof(restricted)marginalmaximumlikelihood,and
finallyemployingtheposteriormedianormean.Thesecondpaper[2]moti-
vatedpenalties,appliedinapenalizedminimumcontrastscheme,byprior
distributionsontheparameters,andderivedestimatorsforthenumberof
nonzero
θ
i
andthe
θ
i
itself.Thefirstisaposteriormode,buttheestimator
for
θ
,called“Bayesiantestimation”,doesnotseemitselfBayesian.(Infact,
theGaussianpriorforthenon-zeroparametersin[2]willbeseentoperform
suboptimallyinourfullyBayesianset-up.)Thepapers[20]and[15]obtain
sharpresultson(nonparametric)empiricalBayesestimators.
Otherrelatedpapersinclude[19],[6],[7],[14],[15],[5].
Apenalizedminimumcontrastestimatorcanoftenbeviewedasthemode
oftheposteriordistribution,anditishelpfultointerpretepenaltiesaccord-
ingly.However,theBayesianapproachyieldsafullposteriordistribution,
whichisarandomprobabilitydistributionontheparameterspace.Ithas
bothalocationandaspread,andcanbemarginalizedtogiveposterior
distributionsforanyfunctionsoftheparametervectorofinterest.Itisthis
objectthatwestudyinthispaper.SuchfullBayesianinferencewasrecently
consideredbyScottandBerger[18],whodiscussedvariousaspectsnotcov-
eredinthepresentpaper,butnoconcentrationresults.Oneexampleofour
resultsisthatthebeta-binomialpriorsin[18],combinedwithmoderately
toheavytailedpriorsonthenonzeromeans,yieldoptimalrecovery.
Sparsity
canbedefinedinvariousways.Perhapsthemostnaturaldefini-
tionistheclassof
nearlyblack
vectors,definedas
`
0
[
p
n
]=
{
θ
∈
R
n
:#(1
≤
i
≤
n
:
θ
i
6
=0)
≤
p
n
}
.
Here
p
n
isagivennumber,whichintheoreticalinvestigationsistypically
assumedtobe
o
(
n
),as
n
→∞
.Sparsitymayalsomeanthatmanymeans
aresmall,butpossiblynotexactlyzero.Definitionsthatmakethisprecise
use
strong
or
weak
`
s
-balls
,typicallyfor
s
∈
(0
,
2).Thesearedefinedas,with
θ
[1]
≥
θ
[2]
≥∙∙∙≥
θ
[
n
]
thenonincreasingpermutationofthecoordinatesof
θ
=(
θ
1
,...,θ
n
),
nX
n
o
`
s
[
p
n
]=
θ
∈
R
n
:1
|
θ
i
|
s
≤
p
ns
nn1=in
n
1
s
p
n
s
o
m
s
[
p
n
]=
θ
∈
R
:
n
1
≤
m
i
a
≤
x
n
i
|
θ
[
i
]
|≤
n.
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
SPARSITYANDBAYESPOSTERIORMEASURE
3
Becausethenonzerocoefficientsin
`
0
[
p
n
]arenotquantitativelyrestricted,
thereisnoinclusionrelationshipbetweenthisspaceandtheweakandstrong
balls,althoughresultsforthelattercanbeobtainedbyprojectingthem
into
`
0
[
p
n
].Ontheotherhand,forany
s>
0wehavetheinclusion
`
s
[
p
n
]
⊂
m
s
[
p
n
].
Theextentofthesparsity,measuredbytheconstant
p
n
,isassumedun-
known.OurBayesianapproachstartsbyputtingaprior
π
n
onthisnumber,
agivenprobabilitymeasureontheset
{
0
,
1
,
2
,...,n
}
.Nextwecomplete
thistoaprioronthesetofallpossiblesequences
θ
=(
θ
1
,...,θ
n
)in
R
n
,
bygivenadraw
p
from
π
n
choosingarandomsubset
S
⊂{
1
,...,n
}
of
cardinality
p
,andchoosingthecorrespondingcoordinates(
θ
i
:
i
∈
S
)from
adensity
g
S
on
R
S
andsettingtheremainingcoordinates(
θ
i
:
i
∈
S
c
)equal
tozero.Giventhisprior,Bayes’ruleyieldstheposteriordistributionof
θ
asusual.Weinvestigatethepropertiesofthisposteriordistribution,inits
dependenceonthepriorsonthedimensionandonthenonzerocoefficients,
inthenonBayesianset-upwhere
X
follows(1.1)with
θ
equaltoafixed,
“true”parameter
θ
0
.
Ifthetrueparametervector
θ
0
belongsto
`
0
[
p
n
],thenitisdesirablethat
theposteriordistributionconcentratesmostofitsmassonnearlyblack
vectors.Onemainresultofthepaperisthatthisisthecaseprovidedthe
priorprobabilities
π
n
{
p
}
decreaseexponentiallyfastwiththedimension
p
.
Thequalityofthereconstructionofthefullvector
θ
canbemeasuredby
variousdistances.AnaturaloneistheEuclideandistance,withsquare
nXk
θ
−
θ
0
k
2
=(
θ
i
−
θ
i
0
)
2
.
1=iIftheindicesofthe
p
n
nonzerocoordinatesofavectorinthemodel
`
0
[
p
n
]
wereknowna-priori,thenthevectorcouldbeestimatedwithmeansquare
erroroftheorder
p
n
.In[11]itisshownthat,as
n,p
n
→∞
with
p
n
=
o
(
n
),
2in
ˆ
fsup
P
n,θ
k
θ
ˆ
−
θ
k
=2
p
n
log(
n/p
n
)1+
o
(1)
.
θθ
∈
`
0
[
p
n
]
Heretheinfimumistakenoverallestimators
θ
ˆ=
θ
ˆ(
X
)and
P
n,θ
denotes
takingtheexpectationundertheassumptionthat
X
is
N
n
(
θ,I
)-distributed.
Inotherwords,thesquareminimaxrateover
`
0
[
p
n
]is
p
n
log(
n/p
n
),meaning
thattheunknownidentityofthenonzeromeansneedstoleadonlytoa
logarithmicloss.
TheBayesianapproachispresumablyadoptedfortheintuitionprovided
bypriormodelling,andisnotnecessarilydirectedatattainingminimax
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011
4
I.CAST