Efficient Optimization for Discriminative Latent Class Models

profil-zyak-2012 - Armand Joulin

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

9 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Niveau: Supérieur, Doctorat, Bac+8
Efficient Optimization for Discriminative Latent Class Models Armand Joulin? INRIA 23, avenue d'Italie, 75214 Paris, France. Francis Bach? INRIA 23, avenue d'Italie, 75214 Paris, France. Jean Ponce? Ecole Normale Superieure 45, rue d'Ulm 75005 Paris, France. Abstract Dimensionality reduction is commonly used in the setting of multi-label super- vised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering meth- ods with added benefits in terms of number of hyperparameters and optimization. While the expectation-maximization (EM) algorithm is commonly used to learn these probabilistic models, it usually leads to local maxima because it relies on a non-convex cost function. To avoid this problem, we introduce a local approx- imation of this cost function, which in turn leads to a quadratic non-convex op- timization problem over a product of simplices. In order to maximize quadratic functions, we propose an efficient algorithm based on convex relaxations and low- rank representations of the data, capable of handling large-scale problems.

discriminative clustering

latent representation

convex optimization

problem

input variable

em algorithm

rather than

tradi- tional convex

reduced-rank regression

Sujets

Joulin

Convex optimization

Problem

Expectation-maximization algorithm

Informations

Publié par	profil-zyak-2012
Nombre de lectures	46
Langue	English

Extrait

Efﬁcient Optimization for Discriminative Latent Class Models

∗ Armand Joulin INRIA 23, avenue d’Italie, 75214 Paris, France. armand.joulin@inria.fr

∗ Francis Bach INRIA 23, avenue d’Italie, 75214 Paris, France. francis.bach@inria.fr

Abstract

∗ Jean Ponce Ecole Normale Supe´rieure 45, rue d’Ulm 75005 Paris, France. jean.ponce@ens.fr

Dimensionality reduction is commonly used in the setting of multilabel super vised classiﬁcation to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering meth ods with added beneﬁts in terms of number of hyperparameters and optimization. While the expectationmaximization (EM) algorithm is commonly used to learn these probabilistic models, it usually leads to local maxima because it relies on a nonconvex cost function. To avoid this problem, we introduce a local approx imation of this cost function, which in turn leads to a quadratic nonconvex op timization problem over a product of simplices. In order to maximize quadratic functions, we propose an efﬁcient algorithm based on convex relaxations and low rank representations of the data, capable of handling largescale problems. Exper iments on text document classiﬁcation show that the new model outperforms other supervised dimensionality reduction methods, while simulations on unsupervised clustering show that our probabilistic formulation has better properties than exist ing discriminative clustering methods.

Introduction

Latent representations of data are widespread tools in supervised and unsupervised learning. They are used to reduce the dimensionality of the data for two main reasons: on the one hand, they provide numerically efﬁcient representations of the data; on the other hand, they may lead to better predictive performance. In supervised learning, latent models are often used in a generative way, e.g., through mixture models on the input variables only, which may not lead to increased predictive performance. This has led to numerous works on supervised dimension reduction (e.g., [1, 2]), where the ﬁnal discriminative goal of prediction is taken explicitly into account during the learning process. In this context, various probabilistic models have been proposed, such as mixtures of experts [3] or discriminative restricted Boltzmann machines [4], where a layer of hidden variables is used between the inputs and the outputs of the supervised learning model. Parameters are usually estimated by expectationmaximization (EM), a method that is computationally efﬁcient but whose cost function may have many local maxima in high dimensions. In this paper, we consider a simplediscriminative latent class(DLC) model where inputs and outputs are independent given the latent representation.We make the following contributions: