Advanced probabilistic models for clustering and projection [Elektronische Ressource] / von Shipeng Yu
188 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Advanced probabilistic models for clustering and projection [Elektronische Ressource] / von Shipeng Yu

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
188 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Advanced Probabilistic Models forClustering and ProjectionDissertation im Fach Informatikan der Fakult¨at fur¨ Mathematik, Informatik und Statistikder Ludwig-Maximilians-Universit¨at Mun¨ chenvonShipeng YuTag der Einreichung: 21 Juni, 2006Tag der mu¨ndlichen Pru¨fung: 29 September, 2006Berichterstatter:Prof. Dr. Hans-Peter Kriegel, Ludwig-Maximilians-Universit¨at Mun¨ chenProf. Dr. Thomas Hofmann, Technische Universit¨at DarmstadtDr. Volker Tresp, Siemens AG, Mu¨nchenTo my parentsAbstractProbabilistic modeling for data mining and machine learning problems is a fundamentalresearch area. The general approach is to assume a generative model underlying theobserved data, and estimate model parameters via likelihood maximization. It has thedeep probability theory as the mathematical background, and enjoys a large amount ofmethods from statistical learning, sampling theory and Bayesian statistics. In this thesiswe study several advanced probabilistic models for data clustering and feature projection,which are the two important unsupervised learning problems.Thegoalofclusteringistogroupsimilardatapointstogethertouncoverthedataclus-ters. While numerous methods exist for various clustering tasks, one important questionstill remains, i.e., how to automatically determine the number of clusters. The first partof the thesis answers this question from a mixture modeling perspective.

Sujets

Informations

Publié par
Publié le 01 janvier 2006
Nombre de lectures 11
Langue English
Poids de l'ouvrage 9 Mo

Extrait

Advanced Probabilistic Models for
Clustering and Projection
Dissertation im Fach Informatik
an der Fakult¨at fur¨ Mathematik, Informatik und Statistik
der Ludwig-Maximilians-Universit¨at Mun¨ chen
von
Shipeng Yu
Tag der Einreichung: 21 Juni, 2006
Tag der mu¨ndlichen Pru¨fung: 29 September, 2006
Berichterstatter:
Prof. Dr. Hans-Peter Kriegel, Ludwig-Maximilians-Universit¨at Mun¨ chen
Prof. Dr. Thomas Hofmann, Technische Universit¨at Darmstadt
Dr. Volker Tresp, Siemens AG, Mu¨nchenTo my parentsAbstract
Probabilistic modeling for data mining and machine learning problems is a fundamental
research area. The general approach is to assume a generative model underlying the
observed data, and estimate model parameters via likelihood maximization. It has the
deep probability theory as the mathematical background, and enjoys a large amount of
methods from statistical learning, sampling theory and Bayesian statistics. In this thesis
we study several advanced probabilistic models for data clustering and feature projection,
which are the two important unsupervised learning problems.
Thegoalofclusteringistogroupsimilardatapointstogethertouncoverthedataclus-
ters. While numerous methods exist for various clustering tasks, one important question
still remains, i.e., how to automatically determine the number of clusters. The first part
of the thesis answers this question from a mixture modeling perspective. A finite mixture
model is first introduced for clustering, in which each mixture component is assumed to
be an exponential family distribution for generality. The model is then extended to an
infinite mixture model, and its strong connection to Dirichlet process (DP) is uncovered
which is a non-parametric Bayesian framework. A variational Bayesian algorithm called
VBDMA is derived from this new insight to learn the number of clusters automatically,
and empirical studies on some 2D data sets and an image data set verify the effectiveness
of this algorithm.
In feature projection, we are interested in dimensionality reduction and aim to find a
low-dimensional feature representation for the data. We first review the well-known prin-
cipal component analysis (PCA) and its probabilistic interpretation (PPCA), and then
generalize PPCA to a novel probabilistic model which is able to handle non-linear pro-
jection known as kernel PCA. An expectation-maximization (EM) algorithm is derived
for kernel PCA such that it is fast and applicable to large data sets. Then we propose
a novel supervised projection method called MORP, which can take the output informa-
tion into account in a supervised learning context. Empirical studies on various data sets
show much better results compared to unsupervised projection and other supervised pro-
jection methods. At the end we generalize MORP probabilistically to propose SPPCA
2for supervised projection, and we can also naturally extend the model to S PPCA which
is a semi-supervised projection method. This allows us to incorporate both the label
information and the unlabeled data into the projection process.
In the third part of the thesis, we introduce a unified probabilistic model which can
iii ABSTRACT
handle data clustering and feature projection jointly. The model can be viewed as a clus-
tering model with projected features, and a projection model with structured documents.
A variational Bayesian learning algorithm can be derived, and it turns out to iterate the
clustering operations and projection operations until convergence. Superior performance
can be obtained for both clustering and projection.Acknowledgements
This dissertation is based on my research work that I carried out as a Ph.D. student
in a joint Ph.D. program between the KDD group at University of Munich (LMU) and
the Learning Systems Department (CT IC 4, previously named Neural Computation) of
Siemens AG. During the past three years, I have been extremely fortunate to have the
guidance, support, andfriendshipofanumberofpeoplewhohelpedmegrowacademically
and personally.
First, I would like to thank Prof. Hans-Peter Kriegel, my supervisor, for his encour-
agements, constructive suggestions and constant support during this research. His door
is always open to me whenever I need his help. I was impressed by his open-mindedness
and academic guidance that make the KDD group at LMU so successful.
I am also greatly thankful to Prof. Thomas Hofmann, who kindly agreed to allocate
his time on supervising my thesis, despite his extremely busy research and teaching work.
I would also like to thank Prof. Martin Wirsing and Prof. Martin Hofmann, for their very
patient instructions on my oral examination.
My co-supervisor at Siemens AG, Dr. Volker Tresp, is the person who had the greatest
role in my intellectual development. He introduced me into the field of statistical machine
learning. Hisenthusiasm,sharpthoughts,open-mindedness, andhumormademyresearch
a really memorable and joyful journey.
I am indebted to the friendship and fellowship with Dr. Kai Yu, who really helped
me a lot in both academic research and personal life. I was amazed by not only his solid
knowledgeinmachinelearning,butalsohiswayofthinking. Hecanalwaysfindthecritical
point of the problem, and we had a very effective and memorable cooperation during my
Ph.D. work.
I feel grateful to Prof. Bernd Schu¨rmann, the leader of the Learning Systems Depart-
ment at Siemens AG, for his constant support to my research. I appreciate his emphasis
on both scientific research and real-world applications, which greatly influences my com-
mitment to my career plan.
Finishing a thesis is not a one-person thing. Here I wish to thank the following peo-
ple: Prof. Christian B¨ohm, Prof. Zoubin Ghahramani, Zhao Xu, Yi Huang, Mrs. Su-
sanne Grienberger, Franz Krojer, Dr. Matthias Schubert, Dr. Peer Kr¨oger, Mrs. Christa
iiiiv ACKNOWLEDGEMENTS
Singer, Mrs. Christine Herzog, Dr.KaiHeesche, Dr. ChristophTietz, Dr. Dengyong Zhou,
Dr. Mingrui Wu, Karsten Borgwardt, Arthur Zimek, Markus Bundschus, Stefan Reckow,
Christof St¨ormann, Huaien Gao, Dr. Hugo Zaragoza, Dr. Ralf Herbrich, Dr. Thore Grae-
pel, Dr. Ji-Rong Wen, Dr. Wei-Ying Ma, ...
Of course, I am grateful to my parents, for their patience and love. Without them this
work would never have come into existence.
Shipeng Yu
Munich, Germany
May, 2006Table of Contents
Abstract i
Acknowledgements iii
List of Algorithms ix
List of Figures xi
List of Tables xv
Prefix xvii
I Probabilistic Clustering Models 1
1 Overview of Clustering Models 3
1.1 Flat Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Model-based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Similarity-based Clustering . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Organization of the Following Chapters . . . . . . . . . . . . . . . . . . . . 7
2 Finite Mixture Models for Clustering 9
2.1 Mixture of Exponential Family Distributions . . . . . . . . . . . . . . . . . 10
2.1.1 Exponential Family Distributions and the Conjugate Family. . . . . 10
2.1.2 Mixture of Exponential Family Distributions . . . . . . . . . . . . . 12
2.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Model Inference and Parameter Estimation . . . . . . . . . . . . . . . . . . 16
2.2.1 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Variational Bayesian Solution . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Model Fitting as Clustering . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4 Variational Bayes for Example Models . . . . . . . . . . . . . . . . . 24
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Infinite Mixture Models 29
vvi TABLE OF CONTENTS
3.1 Infinite Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Dirichlet Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Inference and Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Gibbs Sampling with P´olya Urn Process . . . . . . . . . . . . . . . . 35
3.3.2 Truncated Dirichlet Process . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Dirichlet-Multinomial Allocation . . . . . . . . . . . . . . . . . . . . 39
3.3.4 Comparison of Variational Bayesian Methods for DP . . . . . . . . . 41
3.4 Empirical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Old Faithful Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
II Probabilistic Projection Models 53
4 Overview of Projection Models 55
4.1 Projection Models for Continuous Data . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . 56
4.1.2 Kernel Principal Component Analysis (Kernel PCA) . . . . . . . . . 57
4.1.3 Local Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Projection Methods for Discrete Data . . . . . . . . . . . . . . . .

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents