IntroductionSparsity oracle inequalities(SOI)BIC and LASSODantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Apprentissage statistique, parcimonie et LangevinMonte CarloAlexandre TsybakovLaboratoire de Statistique, CREST ,Laboratoire de Probabilit´es et Mod`eles Al´eatoires,Universit´e Paris 6etCMAP, Ecole PolytechniqueDijon, le 26 novembre 2009Alexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Nonparametric regression model (fixed design)dAssume that we observe the pairs (X ,Y ),...,(X ,Y )∈R ×R1 1 n nwhereY = f(X )+ξ , i = 1,...,n.i i idRegression function f :R →R is unknown2Errors ξ are independent GaussianN(0,σ ) random variables.idX ∈R are arbitrary fixed (non-random) points.iWe want to estimate f based on the data (X ,Y ),...,(X ,Y ).1 1 n nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Approximating function, dictionaryWe assume that there exists a function f (x) (known as a functionθof θ and x) such thatf ≈ fθfor some θ = (θ ,...,θ ).1 MPossibly M nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO ...
Letf1, . . . ,fMbe a finitedictionary of functions,fj:Rd→R. We approximate the regression functionfby linear combination
M fθ(x) =Xθjfj(x) with weightsθ= (θ1, . . . , θM). j=1
We believe that
M f(x)≈Xθjfj(x) j=1
for someθ= (θ1, . . . , θM).
AlexandresTbykavo
Introduction SparsityoracleinBeIqCuaalnitdieLs(ASSOSIO)Model, dictionary, approximation Dantzig selector and LASSO for linear regressionSparsity Sparse exponential weighting (SEW) Scenarios for linear approximation (LinReg) exists thereExact equality:θ∗∈RMsuch that f= fθ∗=PjM=1θj∗fj (linear regression, with possiblyMnparameters); (NPReg)f1, . . . ,fMare the firstMfunctions of a basis (usually orthonormal) andM≤n, there existsθ∗such thatf−fθ∗is small:nonparametric estimation of regression; (Agg)aggregation of arbitrary estimators: in this casef1, . . . ,fM are preliminary estimators offbased on a training sample independent of the observations (X1,Y1), . . . ,(Xn,Yn); Weak learning, additive models etc. Alexandre Tsybakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoadrseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Example: nonlinear approximation
We consider the generalized linear (or single-index) model
fθ(x) =G(θTx)
whereG:R→Ris a known (or unknown) function.
Thus, we believe that
f(x)≈G(θTx)
for someθ= (θ1, . . . , θM), possibly withMn.
AlexandreTsybakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity of a vector
Model, dictionary, approximation Sparsity
The number of non-zero coordinates ofθ:
M M(θ) =XI{θj6=0} j=1
The valueM(θ) characterizes thesparsityof vectorθ∈RM: the smallerM(θ), the “sparser”θ.
AlexandresTbykavo
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
b LetθOLS Letbe the ordinary least squares (OLS) estimator. fθbe linear result:approximation. Elementary
2σ2min(M,n) EkfθbOLS−fkn2≤ kf−fθkn+n
for anyθ∈RMwherek ∙ knis the empirical norm: v =u1Xnf2(Xi). kfkntni=1
AlexandresTbyakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoardseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity and dimension reduction
For anyθ∈RMthe “oracular” OLS that acts only on the relevant M(θ)<ncoordinates satisfies
EkfθobrOaLcSle−fkn2≤ kf−fθkn2+σ2nM(θ).
This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known.
AlexandreTsybakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSOImplications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity oracle inequalities
Do there exist true estimators with similar behavior? Basic idea: b b b Choose some suitable data-driven weightsθ= (θ1, . . . , θM) and estimatefby
M fb(x) = fθb(x) =Xθbjfj(x). j=1
What to do when the approximation is non-lineabr (ex. G(θTx))? Should we also plug in an estimatorθ? b˜ ˜ Can we findθsuch thatf= fθborfdefined in differently satisfies