Statistical relational learning with nonparametric Bayesian models [Elektronische Ressource] / von Zhao Xu

ludwig-maximilians-universitat_munchen - Zhao Xu

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

174 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Mathematics

Informations

Publié par	ludwig-maximilians-universitat_munchen
Publié le	01 janvier 2007
Nombre de lectures	20
Langue	English
Poids de l'ouvrage	1 Mo

Extrait

Statistical Relational Learning with
Nonparametric Bayesian Models
Dissertation im Fach Informatik
an der Fakult¨at fur¨ Mathematik, Informatik und Statistik
der Ludwig-Maximilians-Universit¨at Munc¨ hen
von
Zhao Xu
Tag der Einreichung: 04.06.2007
Tag der mundlic¨ hen Prufung:¨ 25.07.2007
Berichterstatter:
Prof. Dr. Hans-Peter Kriegel, Ludwig-Maximilians-Universit¨at Munc¨ hen
Prof. Dr. Tobias Scheﬀer, Max-Planck-Institut fur¨ Informatik
Dr. Volker Tresp, Siemens AG, Mun¨ chenAbstract
Statistical relational learning analyzes the probabilistic constraints between the entities,
their attributes and relationships. It represents an area of growing interest in modern
data mining. Many leading researches are proposed with promising results. However,
there is no easily applicable recipe of how to turn a relational domain (e.g. a database)
into a probabilistic model. There are mainly two reasons. First, structural learning
in relational models is even more complex than structural learning in (non-relational)
Bayesian networks due to the exponentially many attributes an attribute might depend
on. Second, it might be diﬃcult and expensive to obtain reliable prior knowledge for
the domains of interest. To remove these constraints, this thesis applies nonparametric
Bayesian analysis to relational learning and proposes two compelling models: Dirichlet
enhanced relational learning and inﬁnite hidden relational learning.
Dirichlet enhanced relational learning (DERL) extends nonparametric hierarchical
Bayesianmodelingtorelationaldata. Inexistingrelationalmodels,themodelparameters
are global, which means the conditional probability distributions are the same for each
entity and the relationships are independent of each other. To solve the limitations, we
introduce hierarchical Bayesian (HB) framework to relational learning, such that model
parameters can be personalized, i.e. owned by entities or relationships, and are coupled
via common prior distributions. Additional ﬂexibility is introduced in a nonparametric
HB modeling, such that the learned knowledge can be truthfully represented. For infer-
ence, we develop an eﬃcient variational method, which is motivated by the P´olya urn
representation of DP. DERL is demonstrated in a medical domain where we form a non-
parametric HB model for entities involving hospitals, patients, procedures and diagnoses.
The experiments show that the additional ﬂexibility introduced by the nonparametric
HB modeling results in a more accurate model to represent the dependencies between
diﬀerent types of relationships and gives signiﬁcantly improved prediction performance
about unknown relationships.
Ininﬁnitehiddenrelationalmodel(IHRM),weapplynonparametricmixturemodeling
to relational data, which extends the expressiveness of a relational model by introducing
for each entity an inﬁnite-dimensional hidden variable as part of a Dirichlet process (DP)
mixture model. There are mainly three advantages. First, this reduces the extensive
structural learning, which is particularly diﬃcult in relational models due to the huge
numberofpotentialprobabilisticparents. Second,theinformationcangloballypropagate
in the ground network deﬁned by the relational structure. Third, the number of mixture
components for each entity class can be optimized by the model itself based on the data.
IHRM can be applied for entity clustering and relationship/attribute prediction, which
iii
are two important tasks in relational data mining. For inference of IHRM, we develop
four algorithms: collapsed Gibbs sampling with the Chinese restaurant process, blocked
Gibbs sampling with the truncated stick breaking construction (SBC), and mean-ﬁeld
inference with truncated SBC, as well as an empirical approximation. IHRM is evaluated
in three diﬀerent domains: a recommendation system based on the MovieLens data set,
prediction of the functions of yeast genes/proteins on the data set of KDD Cup 2001, and
the medical data analysis. The experimental results show that IHRM gives signiﬁcantly
improved estimates of attributes/relationships and highly interpretable entity clusters in
complex relational data.Acknowledgments
This thesis is a conclusion of my research work under the joint Ph.D. program between
the KDD group at the Institute of Computer Science, University of Munich and the
Department of Learning Systems, Corporate Technology, Siemens AG. Throughout my
doctoral study, many people helped to guide me and support me during the journey that
eventually led to this thesis.
First, I am deeply indebted to my supervisor, Prof. Dr. Hans-Peter Kriegel. Without
his guidance, encouragement and tremendous support, this thesis would not be possible.
Iowealottomyco-supervisoratSiemensAG,Dr.VolkerTresp,whointroducedmeto
thefascinatingareaofmachinelearningandhasprofoundinﬂuenceonmeasaresearcher.
Theresearchofthisthesisiscarriedoutthroughhisvisioninstatisticalmachinelearning.
Itrulyappreciatehimforhisadvices, wisdom, encouragement, cheerfulnessandpatience.
I feel extremely fortunate to have the opportunity to work with him.
I would like to thank Prof. Dr. Tobias Scheﬀer from the Max Planck Institute for
Computer Science for his invaluable advices and insightful comments on my thesis work.
IwouldalsoliketothankProf.Dr.AlexanderKnappandProf.Dr.HansJur¨ genOhlbach
for their patient instructions on my oral examinations.
IamverygratefultoProf.Dr.BerndSchur¨ mann,headoftheDepartmentofLearning
Systems at Siemens AG, for his constant support to my research.
I also appreciate all the friendships, encouragements and support from colleagues,
whilethefollowinglistisundoubtedlyincomplete: Dr.ClemensOtte, ChristofSt¨ormann,
StefanHagenWeber,Dr.KaiYu,Dr.ShipengYu,Mrs.SusanneGrienberger,Mrs.Christa
Singer, Anton Maximilian Sch¨afer, Dr. Stefan Brecheisen, Franz Krojer, Karsten Borg-
wardt, Dr. Peer Kr¨oger, Dr. Matthias Schubert, Dr. Ralph Grothmann, Dr. Christoph
Tietz and Dr. Kai Heesche.
Finally, it is impossible to have my research career without the love and support from
my family. This thesis is dedicated to them.
Zhao Xu
Munich, Germany
October, 2007
iiiivContents
Abstract i
Acknowledgments iii
Contents iv
I Preliminaries 1
1 Introduction 3
1.1 Motivations and First Discussion of Our Models . . . . . . . . . . . . . . . 4
1.2 Thesis Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Statistical Relational Learning 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 SRL Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Probabilistic Relational Model . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Directed Acyclic Probabilistic Entity Relationship Model . . . . . . 15
2.3.3 Relational Models with Structure Uncertainty . . . . . . . . . . . . 16
2.3.4 Bayesian Logic Programming . . . . . . . . . . . . . . . . . . . . . 18
2.4 SRL Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Object Identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Object Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.3 Object Classiﬁcation/Clustering . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Relationship Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
II Relational Learning with Nonparametric
Hierarchical Models 25
3 Bayesian and Hierarchical Bayesian Models 27
3.1 Bayesian Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
vvi CONTENTS
3.1.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.3 Inference and Parameter Learning . . . . . . . . . . . . . . . . . . . 29
3.1.4 Exponential Family and Conjugate Prior . . . . . . . . . . . . . . . 30
3.1.5 Diﬀerences from Classical Statistical Approaches . . . . . . . . . . . 30
3.1.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Hierarchical Bayesian Models . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Empirical Bayesian Models . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.5 Hierarchical Models in Full Bayesian Framework . . . . . . . . . . . 40
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Nonparametric Hierarchical Bayesian Models 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Dirichlet Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 Dirichlet Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.2 Basic Properties of DP . . . . . . . . . . . . . . . . . . . . . . .