Didacticiel Etudes de cas R R

profil-zyak-2012 - Maison

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

8 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Niveau: Supérieur, Doctorat, Bac+8
Didacticiel - Etudes de cas R.R. 03/09/2006 Page 1 sur 8 Subject Gaussian mixture model based clustering with TANAGRA: the EM algorithm. In the Gaussian mixture model-based clustering, each cluster is represented by a Gaussian distribution. The entire dataset is modeled by a mixture (a linear combination) of these distributions. The EM (Expectation Maximization) algorithm is used in practice to find the “optimal” parameters of the distributions that maximize the likelihood function. The number of clusters is a parameter of the algorithm. But we can also detect the “optimal” number of clusters by evaluating several values, i.e. testing 1 cluster, 2 clusters, etc. and choosing the best one (which maximizes the likelihood or another criterion such as AIC or BIC). Dataset We use a synthetic dataset in a two dimensional space1. We aim to discover two clusters (Figure 1). Figure 1: Two Gaussian with different parameters (means and shapes – covariance matrices) 1 This dataset comes from the free distribution of « FAST EM Clustering » (AUTONLAB --

clustering tab

mixture model

model-based clustering

fast em

em algorithm

subject gaussian

gaussian mixture

Sujets

Mixture model

Expectation-maximization algorithm

Informations

Publié par	profil-zyak-2012
Nombre de lectures	15
Langue	English

Extrait

Didacticiel - Etudes de cas

Subject

R.R.

Gaussian mixture model based clustering with TANAGRA: the EM algorithm. In theGaussian mixture model-based clustering, each cluster is represented by a Gaussian distribution. The entire dataset is modeled by a mixture (a linear combination) of these distributions. TheEM (Expectation Maximization) algorithmused in practice to find the “optimal” is parameters of the distributions that maximize the likelihood function. The number of clusters is a parameter of the algorithm. But we can also detect the “optimal” number of clusters by evaluating several values, i.e. testing 1 cluster, 2 clusters, etc. and choosing the best one (which maximizes the likelihood or another criterion such as AIC or BIC).

Dataset

1 We use a synthetic dataset in a two dimensional space . We aim to discover two clusters (Figure 1).

Figure 1: Two Gaussian with different parameters (means and shapes – covariance matrices)

1 This dataset comes from the free distribution of « FAST EM http://www.autonlab.org/autonweb/10466.html).

03/09/2006

Clustering » (AUTONLAB --

Page 1 sur 8