From text to knowledge [Elektronische Ressource] : bridging the gap with probabilistic graphical models / vorgelegt von Markus Bundschus
170 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

From text to knowledge [Elektronische Ressource] : bridging the gap with probabilistic graphical models / vorgelegt von Markus Bundschus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
170 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

From Text to Knowledge:Bridging the Gap with ProbabilisticGraphical ModelsMarkus BundschusMunchen 2010From Text to Knowledge:Bridging the Gap with ProbabilisticGraphical ModelsMarkus BundschusDissertationan der Fakult at fur Mathematik, Informatik und Statistikder Ludwig{Maximilians{Universit atMunc henvorgelegt vonMarkus Bundschusaus Munc henMunc hen, den 01.06.2010Erstgutachter: Prof. Dr. Hans-Peter KriegelZweitgutachter: Prof. Dr. Philipp CimianoDrittgutachter: Dr. Volker TrespTag der mundlic hen Prufung: 21.07.2010ContentsAcknowledgments ixAbstract xZusammenfassung xiii1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Automatic Construction of Knowledge Bases from Textual Data 92.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.2 Motivation and Problem Statement . . . . . . . . . . . . . . . . . . 102.1.3 Proposed Approach at a Glance . . . . . . . . . . . . . . . . . . . . 122.1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.5 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . 192.

Sujets

Informations

Publié par
Publié le 01 janvier 2010
Nombre de lectures 28
Langue English
Poids de l'ouvrage 3 Mo

Extrait

From Text to Knowledge:
Bridging the Gap with Probabilistic
Graphical Models
Markus Bundschus
Munchen 2010From Text to Knowledge:
Bridging the Gap with Probabilistic
Graphical Models
Markus Bundschus
Dissertation
an der Fakult at fur Mathematik, Informatik und Statistik
der Ludwig{Maximilians{Universit at
Munc hen
vorgelegt von
Markus Bundschus
aus Munc hen
Munc hen, den 01.06.2010Erstgutachter: Prof. Dr. Hans-Peter Kriegel
Zweitgutachter: Prof. Dr. Philipp Cimiano
Drittgutachter: Dr. Volker Tresp
Tag der mundlic hen Prufung: 21.07.2010Contents
Acknowledgments ix
Abstract x
Zusammenfassung xiii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Automatic Construction of Knowledge Bases from Textual Data 9
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Motivation and Problem Statement . . . . . . . . . . . . . . . . . . 10
2.1.3 Proposed Approach at a Glance . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.5 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 A Novel Approach for Knowledge Base Construction with Conditional Ran-
dom Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Named Entity Recognition and Semantic Relation Extraction as Se-
quence Labeling Task . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Conditional Random Fields . . . . . . . . . . . . . . . . . . . . . . 27
2.2.3 CRF Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.4 Feature Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.5 Fact Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Visualization and Interactive Search . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.1 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4.2 Disease-Treatment Relation Extraction from PubMed . . . . . . . . 48
2.4.3 Gene-Disease Relation Extraction from GeneRIF Phrases . . . . . . 50
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56vi CONTENTS
3 Probabilistic Topic Models for Knowledge Discovery in Semantically An-
notated Textual Databases 59
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.3 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Topic Models for Semantically Annotated Documents . . . . . . . . . . . . 65
3.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.2 Classical Latent Dirichlet Allocation Model . . . . . . . . . . . . . . 67
3.2.3 The Topic-Concept Model . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.4 The User-T Model . . . . . . . . . . . . . . . . . . . . 73
3.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3.2 Language Model Based Evaluation . . . . . . . . . . . . . . . . . . 80
3.3.3 Multi-label Text Classi cation . . . . . . . . . . . . . . . . . . . . . 81
3.3.4 Similarity Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.5 Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Digging for Knowledge with Information Extraction: A Case Study on
Human Gene-Disease Associations 99
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.3 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . 102
4.2 State of the Art Curated Human Gene-Disease Association Databases . . . 102
4.3 Literature-derived Human Gene-Disease Network (LHGDN) . . . . . . . . 105
4.4 Analysis of the LHGDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4.2 Disease Gene Property Analysis . . . . . . . . . . . . . . . . . . . . 110
4.4.3 Large-scale Properties of Discovered Facts . . . . . . . . . . . . . . 115
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5 Summary and Outlook 123
A Supporting Information Chapter 4 125
A.1 Additional Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
List of Figures 131
List of Tables 132
List of Algorithms 133Contents vii
List of Abbreviations 137
Bibliography 139viii ContentsAcknowledgments
During the last three years, many individuals advised, guided and supported me in many
di erent ways and thus greatly contributed to this thesis. I am deeply grateful for this.
First of all, I would like to thank Prof: Hans-Peter Kriegel. Without him, this thesis
would not have been possible and he consistently supported me during my research. It was
a great honor for me to be a member of such a successful research group.
I am also greatly thankful to Prof: Philipp Cimiano who kindly agreed to examine this
thesis and spends his extremely spare time for this. Also many thanks to Prof: Fran cois
Bry and Prof: Martin Wirsing for being part of my dissertation committee.
I owe my deepest thanks to Dr: Volker Tresp, who guided and advised me during the last
three years. Volker always encouraged me with his positive attitude and inspiring words.
I was very fortunate to have the opportunity to work with him. His infectious enthusiasm
for research, open-mindedness and perceptive thoughts have been a major factor in my
research.
I am very grateful to my friends and former colleagues Dr: Math aus Dejori and Dr :
Shipeng Yu. Both contributed signi cantly to my research. Special thanks to Shipeng,
who gave me the opportunity to work in the CAD & Knowledge Solutions Group at
Siemens Healthcare in the US.
There are, of course, many other people whose advice, discussions, comments and
feedback have greatly contributed to my Ph.D. work, both directly and indirectly. I would
like to thank Dr: Florian Steinke, Dr: Thomas Runkler, Prof: Martin Greiner, Maximilian
Nickel, Thorsten Fuhring, Stefan Weber, Christa Singer, Davide Magatti, Yi Huang, Peer
Kr oger, Matthias Schubert, Susanne Grienberger and Franz Krojer.
Finally, I am deeply grateful to my family and friends for their never ending support
and their love. My deepest thanks to Nicoline for all her support and care during the last
months.x Abstract

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents