Didacticiel Etudes de cas R R
8 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Didacticiel Etudes de cas R R

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
8 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
Didacticiel - Etudes de cas R.R. 03/09/2006 Page 1 sur 8 Subject Gaussian mixture model based clustering with TANAGRA: the EM algorithm. In the Gaussian mixture model-based clustering, each cluster is represented by a Gaussian distribution. The entire dataset is modeled by a mixture (a linear combination) of these distributions. The EM (Expectation Maximization) algorithm is used in practice to find the “optimal” parameters of the distributions that maximize the likelihood function. The number of clusters is a parameter of the algorithm. But we can also detect the “optimal” number of clusters by evaluating several values, i.e. testing 1 cluster, 2 clusters, etc. and choosing the best one (which maximizes the likelihood or another criterion such as AIC or BIC). Dataset We use a synthetic dataset in a two dimensional space1. We aim to discover two clusters (Figure 1). Figure 1: Two Gaussian with different parameters (means and shapes – covariance matrices) 1 This dataset comes from the free distribution of « FAST EM Clustering » (AUTONLAB --

  • clustering tab

  • mixture model

  • model-based clustering

  • fast em

  • em algorithm

  • subject gaussian

  • gaussian mixture


Sujets

Informations

Publié par
Nombre de lectures 15
Langue English

Extrait

Didacticiel - Etudes de cas
Subject
R.R.
Gaussian mixture model based clustering with TANAGRA: the EM algorithm. In theGaussian mixture model-based clustering, each cluster is represented by a Gaussian distribution. The entire dataset is modeled by a mixture (a linear combination) of these distributions. TheEM (Expectation Maximization) algorithmused in practice to find the “optimal” is parameters of the distributions that maximize the likelihood function. The number of clusters is a parameter of the algorithm. But we can also detect the “optimal” number of clusters by evaluating several values, i.e. testing 1 cluster, 2 clusters, etc. and choosing the best one (which maximizes the likelihood or another criterion such as AIC or BIC).
Dataset
1 We use a synthetic dataset in a two dimensional space . We aim to discover two clusters (Figure 1).
Figure 1: Two Gaussian with different parameters (means and shapes – covariance matrices)
1 This dataset comes from the free distribution of « FAST EM http://www.autonlab.org/autonweb/10466.html).
03/09/2006
Clustering » (AUTONLAB --
Page 1 sur 8
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents