Submitted to the Annals of Statistics arXiv: math PR

profil-zyak-2012

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

36 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Niveau: Supérieur, Doctorat, Bac+8
Submitted to the Annals of Statistics arXiv: math.PR/0000000 NEEDLES AND STRAWS IN A HAYSTACK: POSTERIOR CONCENTRATION FOR POSSIBLY SPARSE SEQUENCES By Ismael Castillo ? and Aad van der Vaart We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequen- tist set-up that the observations are generated according to a fixed mean vector, and are interested in the posterior distribution of the number of nonzero components and the contraction of the posterior distribution to the true mean vector. We find various combinations of priors on the number of nonzero coefficients and on these coeffi- cients that give desirable performance. We also find priors that give suboptimal convergence, for instance Gaussian priors on the nonzero coefficients. We illustrate the results by simulations. 1. Introduction. Suppose that we observe a vector X = (X1, . . . , Xn) in Rn such that (1.1) Xi = ?i + ?i, i = 1, . . . , n, for independent standard normal random variables ?i and an unknown vector of means ? = (?1, . . . , ?n). We are interested in Bayesian inference on ?, in the situation that this vector is possibly sparse.

square

inf ?

gaussian sequence

such full

bayesian inference

pin decrease

full posterior

rate over

Sujets

Square

Bayesian inference

Informations

Publié par	profil-zyak-2012
Nombre de lectures	28
Langue	English

Extrait

SubmittedtotheAnnalsofStatistics
arXiv:
math.PR/0000000

NEEDLESANDSTRAWSINAHAYSTACK:POSTERIOR
CONCENTRATIONFORPOSSIBLYSPARSESEQUENCES
∗ByIsmae¨lCastilloandAadvanderVaart
WeconsiderfullBayesianinferenceinthemultivariatenormal
meanmodelinthesituationthatthemeanvectorissparse.Theprior
distributiononthevectorofmeansisconstructedhierarchicallyby
ﬁrstchoosingacollectionofnonzeromeansandnextaprioronthe
nonzerovalues.Weconsidertheposteriordistributioninthefrequen-
tistset-upthattheobservationsaregeneratedaccordingtoaﬁxed
meanvector,andareinterestedintheposteriordistributionofthe
numberofnonzerocomponentsandthecontractionoftheposterior
distributiontothetruemeanvector.Weﬁndvariouscombinations
ofpriorsonthenumberofnonzerocoeﬃcientsandonthesecoeﬃ-
cientsthatgivedesirableperformance.Wealsoﬁndpriorsthatgive
suboptimalconvergence,forinstanceGaussianpriorsonthenonzero
coeﬃcients.Weillustratetheresultsbysimulations.

1.Introduction.
Supposethatweobserveavector
X
=(
X
1
,...,X
n
)
in
R
n
suchthat
(1.1)
X
i
=
θ
i
+
ε
i
,i
=1
,...,n,
forindependentstandardnormalrandomvariables
ε
i
andanunknownvector
ofmeans
θ
=(
θ
1
,...,θ
n
).WeareinterestedinBayesianinferenceon
θ
,in
thesituationthatthisvectorispossibly
sparse
.
Non-Bayesianapproachestothisproblemhaverecentlybeenconsidered
bymanyauthors.Golubev[13]obtainedresultsformodelselectionmethods
andthresholdestimatorsforthemean-squaredrisk.Birge´andMassart[4]
treatedthemodelwithintheirgeneralcontextofmodelselectionbypenal-
izedleastsquares.Abramovichetal.in[1]studiedtheperformanceofthe
FalseDiscoveryRatemethod.TheearlierworkbyDonohoandJohnstone
[10]canbeviewedasstudyingtheproblemwithinan
`
r
context.Manyau-
thors(seee.g.[3],[22],[21]andreferencescitedthere)haveinvestigatedthe
connectiontotheLASSOorsimilarmethods.
MethodswithaBayesianconnectionwerestudiedbyGeorgeandFoster
[12],Zhang[20],JohnstoneandSilverman[16,17],Abramovich,Grinshtein
∗
WorkpartlysupportedbyaPostdoctoralfellowshipfromtheVUUniversityAmster-
madAMS2000subjectclassiﬁcations:
Primary62G05,62G20
Keywordsandphrases:
Bayesianestimators,Sparsity,Gaussiansequencemodel,Mix-
turepriors,Asymptotics,Contraction.
1imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

2
I.CASTILLOANDA.W.VANDERVAART
andPensky[2],andJiangandZhang[15].Thepapers[12]and[16]con-
sideredanempiricalBayesmethod,consistingofmodellingtheparameters
θ
1
,...,θ
n
a-prioriasindependentlydrawnfromamixtureofaDiracmea-
sureat0andacontinuousdistribution,determininganappropriatemixing
weightbythemethodof(restricted)marginalmaximumlikelihood,and
ﬁnallyemployingtheposteriormedianormean.Thesecondpaper[2]moti-
vatedpenalties,appliedinapenalizedminimumcontrastscheme,byprior
distributionsontheparameters,andderivedestimatorsforthenumberof
nonzero
θ
i
andthe
θ
i
itself.Theﬁrstisaposteriormode,buttheestimator
for
θ
,called“Bayesiantestimation”,doesnotseemitselfBayesian.(Infact,
theGaussianpriorforthenon-zeroparametersin[2]willbeseentoperform
suboptimallyinourfullyBayesianset-up.)Thepapers[20]and[15]obtain
sharpresultson(nonparametric)empiricalBayesestimators.
Otherrelatedpapersinclude[19],[6],[7],[14],[15],[5].
Apenalizedminimumcontrastestimatorcanoftenbeviewedasthemode
oftheposteriordistribution,anditishelpfultointerpretepenaltiesaccord-
ingly.However,theBayesianapproachyieldsafullposteriordistribution,
whichisarandomprobabilitydistributionontheparameterspace.Ithas
bothalocationandaspread,andcanbemarginalizedtogiveposterior
distributionsforanyfunctionsoftheparametervectorofinterest.Itisthis
objectthatwestudyinthispaper.SuchfullBayesianinferencewasrecently
consideredbyScottandBerger[18],whodiscussedvariousaspectsnotcov-
eredinthepresentpaper,butnoconcentrationresults.Oneexampleofour
resultsisthatthebeta-binomialpriorsin[18],combinedwithmoderately
toheavytailedpriorsonthenonzeromeans,yieldoptimalrecovery.
Sparsity
canbedeﬁnedinvariousways.Perhapsthemostnaturaldeﬁni-
tionistheclassof
nearlyblack
vectors,deﬁnedas
`
0
[
p
n
]=
{
θ
∈
R
n
:#(1
≤
i
≤
n
:
θ
i
6
=0)
≤
p
n
}
.
Here
p
n
isagivennumber,whichintheoreticalinvestigationsistypically
assumedtobe
o
(
n
),as
n
→∞
.Sparsitymayalsomeanthatmanymeans
aresmall,butpossiblynotexactlyzero.Deﬁnitionsthatmakethisprecise
use
strong
or
weak
`
s
-balls
,typicallyfor
s
∈
(0
,
2).Thesearedeﬁnedas,with
θ
[1]
≥
θ
[2]
≥∙∙∙≥
θ
[
n
]
thenonincreasingpermutationofthecoordinatesof
θ
=(
θ
1
,...,θ
n
),
nX
n
o
`
s
[
p
n
]=
θ
∈
R
n
:1
|
θ
i
|
s
≤
p
ns
nn1=in
n
1
s

p
n

s
o
m
s
[
p
n
]=
θ
∈
R
:
n
1
≤
m
i
a
≤
x
n
i
|
θ
[
i
]
|≤
n.
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

SPARSITYANDBAYESPOSTERIORMEASURE
3
Becausethenonzerocoeﬃcientsin
`
0
[
p
n
]arenotquantitativelyrestricted,
thereisnoinclusionrelationshipbetweenthisspaceandtheweakandstrong
balls,althoughresultsforthelattercanbeobtainedbyprojectingthem
into
`
0
[
p
n
].Ontheotherhand,forany
s>
0wehavetheinclusion
`
s
[
p
n
]
⊂
m
s
[
p
n
].
Theextentofthesparsity,measuredbytheconstant
p
n
,isassumedun-
known.OurBayesianapproachstartsbyputtingaprior
π
n
onthisnumber,
agivenprobabilitymeasureontheset
{
0
,
1
,
2
,...,n
}
.Nextwecomplete
thistoaprioronthesetofallpossiblesequences
θ
=(
θ
1
,...,θ
n
)in
R
n
,
bygivenadraw
p
from
π
n
choosingarandomsubset
S
⊂{
1
,...,n
}
of
cardinality
p
,andchoosingthecorrespondingcoordinates(
θ
i
:
i
∈
S
)from
adensity
g
S
on
R
S
andsettingtheremainingcoordinates(
θ
i
:
i
∈
S
c
)equal
tozero.Giventhisprior,Bayes’ruleyieldstheposteriordistributionof
θ
asusual.Weinvestigatethepropertiesofthisposteriordistribution,inits
dependenceonthepriorsonthedimensionandonthenonzerocoeﬃcients,
inthenonBayesianset-upwhere
X
follows(1.1)with
θ
equaltoaﬁxed,
“true”parameter
θ
0
.
Ifthetrueparametervector
θ
0
belongsto
`
0
[
p
n
],thenitisdesirablethat
theposteriordistributionconcentratesmostofitsmassonnearlyblack
vectors.Onemainresultofthepaperisthatthisisthecaseprovidedthe
priorprobabilities
π
n
{
p
}
decreaseexponentiallyfastwiththedimension
p
.
Thequalityofthereconstructionofthefullvector
θ
canbemeasuredby
variousdistances.AnaturaloneistheEuclideandistance,withsquare
nXk
θ
−
θ
0
k
2
=(
θ
i
−
θ
i
0
)
2
.
1=iIftheindicesofthe
p
n
nonzerocoordinatesofavectorinthemodel
`
0
[
p
n
]
wereknowna-priori,thenthevectorcouldbeestimatedwithmeansquare
erroroftheorder
p
n
.In[11]itisshownthat,as
n,p
n
→∞
with
p
n
=
o
(
n
),
2in
ˆ
fsup
P
n,θ
k
θ
ˆ
−
θ
k
=2
p
n
log(
n/p
n
)1+
o
(1)
.
θθ
∈
`
0
[
p
n
]
Heretheinﬁmumistakenoverallestimators
θ
ˆ=
θ
ˆ(
X
)and
P
n,θ
denotes
takingtheexpectationundertheassumptionthat
X
is
N
n
(
θ,I
)-distributed.
Inotherwords,thesquareminimaxrateover
`
0
[
p
n
]is
p
n
log(
n/p
n
),meaning
thattheunknownidentityofthenonzeromeansneedstoleadonlytoa
logarithmicloss.
TheBayesianapproachispresumablyadoptedfortheintuitionprovided
bypriormodelling,andisnotnecessarilydirectedatattainingminimax

imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

4
I.CAST

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Submitted to the Annals of Statistics arXiv: math PR

Square

Bayesian inference

YouScribe

Le catalogue

Le service

Les conditions