Active Set Algorithm for Structured Sparsity Inducing Norms

profil-zyak-2012 - Rodolphe Jenatton1

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

6 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Niveau: Supérieur, Doctorat, Bac+8
Active Set Algorithm for Structured Sparsity-Inducing Norms Rodolphe Jenatton1 Jean-Yves Audibert1,2 Francis Bach1 Abstract We consider the empirical risk minimization problem for linear supervised learn- ing, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ?1-norm and the group ?1-norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resul- ting nonzero patterns. In particular, we show how geometrical information about the variables can be encoded by our regularization. We finally present an active set algorithm to efficiently solve the corresponding minimization problem. 1 Introduction Regularization by the ?1-norm is now a widespread tool in machine learning, statistics and signal processing: it allows linear variable selection in potentially high dimensions, with both efficient algorithms [1] and well-developed theory for generalization properties and variable selection con- sistency [2, 3, 4]. However, the ?1-norm cannot easily encode prior knowledge about the patterns of nonzero coeffi- cients (“nonzero patterns”) induced in the solution, since they are all theoretically possible.

group

variable selection

problem dual

green groups

reduced problem

nonzero variables

group ?1-norms

nonzero coeffi- cients

standard sparsity-inducing

Sujets

Feature selection

Informations

Publié par	profil-zyak-2012
Nombre de lectures	19
Langue	English

Extrait

Active Set Algorithm for Structured Sparsity-Inducing Norms

1 Rodolphe Jenattonrodolphe.jenatton@inria.fr 1,2 Jean-Yves Audibertaudibert@certis.enpc.fr 1 Francis Bachfrancis.bach@inria.fr Abstract

We consider the empirical risk minimization problem for linear supervised learn-ing, with regularization by structured sparsity-inducing norms. These are deﬁned as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1-norm and the groupℓ1-norm by allowing the subsets to overlap.This leads to a speciﬁc set of allowed nonzero patterns for the solutions of such problems.We ﬁrst explore the relationship between the groups deﬁning the norm and the resul-ting nonzero patterns.In particular, we show how geometrical information about the variables can be encoded by our regularization.We ﬁnally present an active set algorithm to efﬁciently solve the corresponding minimization problem.

1 Introduction

Regularization by theℓ1-norm is now a widespread tool in machine learning, statistics and signal processing: itallows linear variable selection in potentially high dimensions, with both efﬁcient algorithms [1] and well-developed theory for generalization properties and variable selection con-sistency [2, 3, 4]. However, theℓ1-norm cannot easily encode prior knowledge about the patterns of nonzero coefﬁ-cients (“nonzero patterns”) induced in the solution, since they are all theoretically possible.Group ℓ1-norms [5, 6, 7] consider a partition of all variables into a certain number of subsets and penalize the sum of the Euclidean norms of each one, leading to selection of groups rather than individual variables. Moreover,recent works have considered overlapping but nested groups in constrained situations such as trees and directed acyclic graphs [8, 9]. In this paper, we consider all possible sets of groups and characterize exactly what type of prior knowledge can be encoded by considering sums of norms of overlapping groups of variables.In particular, when the variables are organized in a 2-dimensional grid, our regularization leads to the selection of convex nonzero patterns.We then present an efﬁcient active set algorithm that scales well to high dimensions, by exploiting the sparsity and the structure of the groups.Finally, the scalability of the algorithm is illustrated on synthetic data. P p p q1/q Notation.Forx∈Randq∈[1,∞), we denote bykxkqitsℓq-norm deﬁned as(|xj|) j=1 p = x|. Givenw∈Rand a subsetJof{1, . . . , p}with cardinality|J|, andkxk∞maj∈{1,...,p}|xj |J| wJdenotes the vector inRof elements ofwindexed byJ. Furthermore, for two vectorsxandy p⊤p inR, we denote byx◦y= (x1y1, . . . , xpyp)∈Rthe elementwise product ofxandy.

2 RegularizedRisk Minimization

We consider the problem of predicting a random variableY∈ Yfrom a (potentially non random) p vectorX∈R, whereYis the set of responses, typically a subset ofR. Weassume that we are

1 INRIA-WILLOWProject-team,Laboratoired'Informatiquedel'EcoleNormaleSup´erieure (INRIA/ENS/CNRS UMR 8548), 23, avenue d'Italie, 75214 Paris. Fran ce. 2 Imagine(ENPC/CSTB),Universite´Paris-Est,6avenueBlaisePascal,77455Marne-la-Vall´ee,France.