Apprentissage statistique, parcimonie et Langevin Monte Carlo

54 pages

English

Apprentissage statistique, parcimonie et Langevin Monte Carlo

Thion - Alexandre Tsybakov

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

54 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

IntroductionSparsity oracle inequalities(SOI)BIC and LASSODantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Apprentissage statistique, parcimonie et LangevinMonte CarloAlexandre TsybakovLaboratoire de Statistique, CREST ,Laboratoire de Probabilit´es et Mod`eles Al´eatoires,Universit´e Paris 6etCMAP, Ecole PolytechniqueDijon, le 26 novembre 2009Alexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Nonparametric regression model (ﬁxed design)dAssume that we observe the pairs (X ,Y ),...,(X ,Y )∈R ×R1 1 n nwhereY = f(X )+ξ , i = 1,...,n.i i idRegression function f :R →R is unknown2Errors ξ are independent GaussianN(0,σ ) random variables.idX ∈R are arbitrary ﬁxed (non-random) points.iWe want to estimate f based on the data (X ,Y ),...,(X ,Y ).1 1 n nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Approximating function, dictionaryWe assume that there exists a function f (x) (known as a functionθof θ and x) such thatf ≈ fθfor some θ = (θ ,...,θ ).1 MPossibly M nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO ...

Informations

Publié par	Thion
Nombre de lectures	72
Langue	English

Extrait

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)

Apprentissage statistique, parcimonie et Langevin Monte Carlo

Alexandre Tsybakov

Laboratoire de Statistique, CREST , LaboratoiredeProbabilit´esetMod`elesAle´atoires, Universit´eParis6 et CMAP, Ecole Polytechnique

Dijon, le 26 novembre 2009

AlexandreTsybakov

Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)

Nonparametric regression model (ﬁxed design)

Assume that we observe the pairs (X1,Y1), . . . ,(Xn,Yn)∈Rd×R where

Yi=f(Xi) +ξi,i= 1, . . . ,n.

Regression functionf:Rd→Ris unknown Errorsξiare independent GaussianN(0, σ2) random variables. Xi∈Rdare arbitrary ﬁxed (non-random) points.

We want to estimatefbased on the data (X1,Y1), . . . ,(Xn,Yn).

AlexandresTbykavo

Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)

Approximating function, dictionary

We assume that there exists a function fθ(x) (known as a function ofθandx) such that f≈fθ

for someθ= (θ1, . . . , θM).

lAxenard

PossiblyMn

esTbykavo

Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)

Example: linear approximation, dictionary

Letf1, . . . ,fMbe a ﬁnitedictionary of functions,fj:Rd→R. We approximate the regression functionfby linear combination

M fθ(x) =Xθjfj(x) with weightsθ= (θ1, . . . , θM). j=1

We believe that

M f(x)≈Xθjfj(x) j=1

for someθ= (θ1, . . . , θM).

AlexandresTbykavo

Introduction SparsityoracleinBeIqCuaalnitdieLs(ASSOSIO)Model, dictionary, approximation Dantzig selector and LASSO for linear regressionSparsity Sparse exponential weighting (SEW) Scenarios for linear approximation (LinReg) exists thereExact equality:θ∗∈RMsuch that f= fθ∗=PjM=1θj∗fj (linear regression, with possiblyMnparameters); (NPReg)f1, . . . ,fMare the ﬁrstMfunctions of a basis (usually orthonormal) andM≤n, there existsθ∗such thatf−fθ∗is small:nonparametric estimation of regression; (Agg)aggregation of arbitrary estimators: in this casef1, . . . ,fM are preliminary estimators offbased on a training sample independent of the observations (X1,Y1), . . . ,(Xn,Yn); Weak learning, additive models etc. Alexandre Tsybakov

Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoadrseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)

Example: nonlinear approximation

We consider the generalized linear (or single-index) model

fθ(x) =G(θTx)

whereG:R→Ris a known (or unknown) function.

Thus, we believe that

f(x)≈G(θTx)

for someθ= (θ1, . . . , θM), possibly withMn.

AlexandreTsybakov

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)

Sparsity of a vector

Model, dictionary, approximation Sparsity

The number of non-zero coordinates ofθ:

M M(θ) =XI{θj6=0} j=1

The valueM(θ) characterizes thesparsityof vectorθ∈RM: the smallerM(θ), the “sparser”θ.

AlexandresTbykavo

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)

Sparsity of the model

Model, dictionary, approximation Sparsity

Intuitive formulation of sparsity assumption:

f(x)≈fθ(“fis well approximated by fθ”)

where the vectorθ= (θ1, . . . , θM) is sparse:

lAxenardeT

M(θ)M.

ysabokv

Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)

Sparsity and dimension reduction

b LetθOLS Letbe the ordinary least squares (OLS) estimator. fθbe linear result:approximation. Elementary

2σ2min(M,n) EkfθbOLS−fkn2≤ kf−fθkn+n

for anyθ∈RMwherek ∙ knis the empirical norm: v =u1Xnf2(Xi). kfkntni=1

AlexandresTbyakov

Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoardseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)

Sparsity and dimension reduction

For anyθ∈RMthe “oracular” OLS that acts only on the relevant M(θ)<ncoordinates satisﬁes

EkfθobrOaLcSle−fkn2≤ kf−fθkn2+σ2nM(θ).

This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known.

AlexandreTsybakov

Introduction Sparsity oracle inequalities(SOI) BIC and LASSOImplications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)

Sparsity oracle inequalities

Do there exist true estimators with similar behavior? Basic idea: b b b Choose some suitable data-driven weightsθ= (θ1, . . . , θM) and estimatefby

M fb(x) = fθb(x) =Xθbjfj(x). j=1

What to do when the approximation is non-lineabr (ex. G(θTx))? Should we also plug in an estimatorθ? b˜ ˜ Can we ﬁndθsuch thatf= fθborfdeﬁned in diﬀerently satisﬁes

Ekf˜−fk2n.kf−fθkn2+σ2M(θ),∀θ? n

AlexandreTsybakov

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Apprentissage statistique, parcimonie et Langevin Monte Carlo

YouScribe

Le catalogue

Le service

Les conditions