Sparse methods for machine learning Theory and algorithms

96 pages

English

Sparse methods for machine learning Theory and algorithms

profil-ontoe-2012 - Francis Bach

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

96 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Niveau: Supérieur, Licence, Bac+1
Sparse methods for machine learning Theory and algorithms Francis Bach Guillaume Obozinski Willow project, INRIA - Ecole Normale Superieure ECML - PKDD 2010 - Tutorial

norms

methods

usual convex

sparsity- inducing norms

?1 -norm ?w?1

norm

ecole normale

Sujets

École normale

Informations

Publié par	profil-ontoe-2012
Nombre de lectures	42
Langue	English

Extrait

Sparse methods for machine learning Theory and algorithms

Francis Bach Guillaume Obozinski Willowproject,INRIA-EcoleNormaleSupe´rieure

ECML - PKDD 2010 - Tutorial

Supervised learning and regularization

•Data:xi∈ X,yi∈ Y,i= 1     n

•Minimize with respect to functionf: YX →:

n Xℓ(yi f(xi)) i=1 Error on data

Loss & function space ?

•Two theoretical/algorithmic issues:

1. Loss 2.Function space / norm

λ 2kfk2 Regularization

Norm ?

Usual losses

•Rgeerssoin:y∈R, predictionyˆ =f(x), quadratic costℓ(y f) = 2(y−yˆ)2=2(y−f)2

•Classiﬁcation:y∈ {−11}predictionyˆ =sign(f(x))

–loss of the formℓ(y f) =ℓ(yf) ue c : – ost“Tr ”ℓ(yf) = 1yf <0 –Usualconvexcost 5

−03

−2

−1

0−1 hinge square logistic

•Main goal:

Regularizations

avoid overﬁtting

•Two main lines of work:

•Main goal:

Regularizations

avoid overﬁtting

•Two main lines of work:

1.EuclideanandHilbertiannorms (i.e.,ℓ2-norms) –Possibility of non linear predictors –Non parametric supervised learning and kernel methods –Well developped theory and algorithms (see, e.g., Wahba, 1990; Scho¨lkopfandSmola,2001;Shawe-TaylorandCristianini,2004) 2.rapSgucin-indsitynorms –restricted to linear predictors on vectorsUsually f(x) =w⊤x –Main example:ℓ1-normkwk1=Pip=1|wi| –Perform model selection as well as regularization –Theory and algorithms “in the making”

ℓ2vs.ℓ1 Laplacian- Gaussian hare vs. tortoise

•First-order methods (Fu, 1998; Beck and Teboulle, 2009) •Homotopy methods (Markowitz, 1956; Efron et al., 2004)

Lasso - Two main recent theoretical results

1.Support

recovery

condition

(Zhao

and

Yu,

2006;

Wainwright,

2009; Zou, 2006; Yuan and Lin, 2007): the Lasso is sign-consistent if and only if there are low correlations between relevant and irrelevant variables.

Lasso - Two main recent theoretical results

1.Support recovery condition(Zhao and Yu, 2006; Wainwright, 2009; Zou, 2006; Yuan and Lin, 2007): the Lasso is sign-consistent if and only if there are low correlations between relevant and irrelevant variables.

2.Exponentially many irrelevant variables(Zhao and Yu, 2006; Wainwright, 2009; Bickel et al., 2009; Lounici, 2008; Meinshausen and Yu, 2008): under appropriate assumptions, consistency is possible as long as logp=O(n)