Machine Learning Summer School Ile de Re Learning with sparsity inducing norms slides
131 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Machine Learning Summer School Ile de Re Learning with sparsity inducing norms slides

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
131 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Learning with sparsity-inducing norms Francis Bach INRIA - Ecole Normale Superieure MLSS 2008 - Ile de Re, 2008

  • solution iff

  • supervised learning

  • usual convex

  • function space

  • sparsity- inducing norms

  • ?1 -norm ?w?1

  • norm

  • piecewise quadratic function


Sujets

Informations

Publié par
Nombre de lectures 26
Langue English
Poids de l'ouvrage 2 Mo

Extrait

Learning with sparsity-inducing norms
Francis Bach
INRIA - Ecole Normale Sup´erieure
MLSS 2008 - Ile de R´e, 2008Supervised learning and regularization
• Data: x ∈X, y ∈Y, i = 1,...,n
i i
• Minimize with respect to function f∈F:
n
X
λ
2
ℓ(y,f(x )) + kfk
i i
2
i=1
Error on data + Regularization
Loss & function space ? Norm ?
• Two issues:
– Loss
– Function space / normUsual losses [SS01, STC04]
• Regression: y∈R, prediction yˆ=f(x),
1 2
– quadratic cost ℓ(y,f(x)) = (y−f(x))
2
• Classification : y∈{−1,1} prediction yˆ= sign(f(x))
– loss of the form ℓ(y,f(x)) =ℓ(yf(x))
– “True” cost: ℓ(yf(x)) = 1
yf(x)<0
– Usual convex costs:
5
0−1
hinge
4
square
logistic
3
2
1
0
−3 −2 −1 0 1 2 3 4Regularizations
• Main goal: control the “capacity” of the learning problem
• Two main lines of work
1. Use Hilbertian (RKHS) norms
– Non parametric supervised learning and kernel methods
– Well developped theory [SS01, STC04, Wah90]
2. Use “sparsity inducing” norms
P
p
– main example: ℓ -normkwk = |w|
1 1 i
i=1
– Perform model selection as well as regularization
– Often used heuristically
• Goal of the course: Understand how and when to use
sparsityinducing normsWhy ℓ -norms lead to sparsity?
1
1
2
• Example 1: quadratic problem in 1D, i.e. min x −xy+λ|x|
x∈R
2
• Piecewise quadratic function with a kink at zero
– Derivative at 0+: g =λ−y and 0−: g =−λ−y
+ −
– x = 0 is the solution iff g > 0 and g 6 0 (i.e.,|y|6λ)
+ −

– x> 0 is the solution iff g 6 0 (i.e., y>λ)⇒ x =y−λ
+

– x6 0 is the solution iff g 6 0 (i.e., y6−λ)⇒ x =y+λ


• Solution x = sign(y)(|y|−λ) = soft thresholding
+Why ℓ -norms lead to sparsity?
1
• Example 2: isotropic quadratic problem
p p
X X
1 1
2 ⊤ ⊤
• min x − xy +λkxk = min x x−x y+λkxk
i i 1 1
i
p p
x∈R x∈R
2 2
i=1 i=1

• solution: x = sign(y )(|y|−λ)
i i +
i
• decoupled soft thresholdingWhy ℓ -norms lead to sparsity?
1
• Example 3: general quadratic problem
– coupled soft thresolding
• Geometric interpretation
– NB : Penalizing is “equivalent” to constrainingCourse Outline
1
1. ℓ -norm regularization
• Review of nonsmooth optimization problems and algorithms
• Algorithms for the Lasso (generic or dedicated)
• Examples
2. Extensions
• Group Lasso and multiple kernel learning (MKL) + case study
• Sparse methods for matrices
• Sparse PCA
3. Theory - Consistency of pattern selection
• Low and high dimensional setting
• Links with compressed sensingℓ -norm regularization
1
p
• Data: covariates x ∈R , responses y ∈Y, i = 1,...,n, given in
i i
p n×p
vector y∈R and matrix X∈R
p
• Minimize with respect to loadings/weights w∈R :
n
X

ℓ(y,w x ) + λkwk
i i 1
i=1
Error on data + Regularization
• Including a constant term b?
• Assumptions on loss:
– convex and differentiable in the second variable
– NB: with the square loss ⇒ basis pursuit (signal processing)
[CDS01], Lasso (statistics/machine learning) [Tib96]A review of nonsmooth convex
analysis and optimization
• Analysis: optimality conditions
• Optimization: algorithms
– First order methods
– Second order methods
• Books: Boyd & VandenBerghe [BV03], Bonnans et al.[BGLS03],
Nocedal & Wright [NW06], Borwein & Lewis [BL00]

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents