Semi-supervised structured prediction models [Elektronische Ressource] / von Ulf Brefeld

humboldt-universitat_zu_berlin - Brefeld Ulf

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

182 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	humboldt-universitat_zu_berlin
Publié le	01 janvier 2008
Nombre de lectures	10
Langue	English
Poids de l'ouvrage	3 Mo

Extrait

Semi-supervised Structured Prediction Models
DISSERTATION
zur Erlangung des akademischen Grades
doctor rerum naturalium
(Dr. rer. nat.)
im Fach Informatik
eingereicht an der
Mathematisch-Naturwissenschaftlichen Fakultät II
Humboldt-Universität zu Berlin
von
Herr Dipl.-Inf. Ulf Brefeld
geboren am 29.10.1973 in Gronau/Westf.
Präsident der Humboldt-Universität zu Berlin:
Prof. Dr. Dr. h.c. Christoph Markschies
Dekan der Mathematisch-Naturwissenschaftlichen Fakultät II:
Prof. Dr. Wolfgang Coy
Gutachter:
1. Prof. Dr. Tobias Scheﬀer
2. Prof. Dr. Hans-Dieter Burkhard
3. Prof. Dr. Thorsten Joachims
eingereicht am: 27. Juli 2007
Tag der mündlichen Prüfung: 15. Januar 2008Abstract
Learning mappings between arbitrary structured input and output variables
is a fundamental problem in machine learning. It covers many natural learn-
ing tasks and challenges the standard model of learning a mapping from
independently drawn instances to a small set of labels. Potential applica-
tions include classiﬁcation with a class taxonomy, named entity recognition,
and natural language parsing. In these structured domains, labeled training
instancesaregenerallyexpensivetoobtainwhileunlabeledinputsarereadily
available and inexpensive.
This thesis deals with semi-supervised learning of discriminative models
for structured output variables. The analytical techniques and algorithms of
classicalsemi-supervised learningare liftedtothestructuredsetting. Several
approaches based on diﬀerent assumptions of the data are presented. Co-
learning, for instance, maximizes the agreement among multiple hypotheses
while transductive approaches rely on an implicit cluster assumption. Fur-
thermore, in the framework of this dissertation, a case study on email batch
detection in message streams is presented. The involved tasks exhibit an
inherent cluster structure and theted solution exploits the streaming
nature of the data.
The diﬀerent approaches are developed into semi-supervised structured
prediction models and eﬃcient optimization strategies thereof are presented.
Thenovelalgorithmsgeneralizestate-of-the-artapproachesinstructurallearn-
ing such as structural support vector machines. Empirical results show
that the semi-supervised algorithms lead to signiﬁcantly lower error rates
than their fully supervised counterparts in many application areas, includ-
ing multi-class classiﬁcation, named entity recognition, and natural language
parsing.
Keywords:
Learning with structured data, Semi-supervised learning, Kernel machines,
Natural language processingZusammenfassung
DasLernenausstrukturiertenEingabe-undAusgabebeispielenistdieGrund-
lagefürdieautomatisierteVerarbeitungnatürlichauftretenderProblemstell-
ungenundeineHerausforderungfürdasMaschinelleLernen.DieEinordnung
von Objekten in eine Klassentaxonomie, die Eigennamenerkennung und das
Parsen natürlicher Sprache sind mögliche Anwendungen. Klassische Verfah-
ren scheitern an der komplexen Natur der Daten, da sie die multiplen Ab-
hängigkeiten und Strukturen nicht erfassen können. Zudem ist die Erhebung
von klassiﬁzierten Beispielen in strukturierten Anwendungsgebieten aufwän-
dig und ressourcenintensiv, während unklassiﬁzierte Beispiele günstig und
frei verfügbar sind.
Diese Arbeit thematisiert halbüberwachte, diskriminative Vorhersagemo-
delle für strukturierte Daten. Ausgehend von klassischen halbüberwachten
Verfahren werden die zugrundeliegenden analytischen Techniken und Algo-
rithmen auf das Lernen mit strukturierten Variablen übertragen. Die unter-
suchten Verfahren basieren auf unterschiedlichen Prinzipien und Annahmen,
wie zum Beispiel der Konsensmaximierung mehrerer Hypothesen im Lernen
aus mehreren Sichten, oder der räumlichen Struktur der Daten im transduk-
tivenLernen.DesweiterenwirdineinerFallstudiezurEmail-Batcherkennung
die räumliche Struktur der Daten ausgenutzt und eine Lösung präsentiert,
die der sequenziellen Natur der Daten gerecht wird.
AusdentheoretischenÜberlegungenwerdenhalbüberwachte,strukturier-
te Vorhersagemodelle und eﬃziente Optimierungsstrategien abgeleitet. Die
empirische Evaluierung umfasst Klassiﬁkationsprobleme, Eigennamenerken-
nung und das Parsen natürlicher Sprache. Es zeigt sich, dass die halbüber-
wachten Methoden in vielen Anwendungen zu signiﬁkant kleineren Fehlerra-
ten führen als vollständig überwachte Baselineverfahren.
Schlagwörter:
Lernen mit strukturierten Daten, Halbüberwachtes Lernen, Kernverfahren,
Natürliche SprachverarbeitungivContents
1 Introduction 1
1.1 Structured Learning . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Semi-supervised Prediction Models for Structured Data . . . . 4
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Outline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Previously Published Work . . . . . . . . . . . . . . . . . . . . 7
2 Problem Setting 9
2.1 Supervised Machine Learning . . . . . . . . . . . . . . . . . . 10
2.2 Semi-supervised Multi-view Learning . . . . . . . . . . . . . . 14
3 Learning in Joint Input-Output Spaces 19
3.1 Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Markov Random Fields . . . . . . . . . . . . . . . . . . 20
3.1.2 Conditional Random Fields . . . . . . . . . . . . . . . 23
3.1.3 Generalized Linear Models in Multiple Views . . . . . 26
3.2 Joint Feature Representation . . . . . . . . . . . . . . . . . . . 29
3.2.1 Multi-class Classiﬁcation . . . . . . . . . . . . . . . . . 29
3.2.2 Label Sequence Learning . . . . . . . . . . . . . . . . . 32
3.2.3 Natural Language Parsing . . . . . . . . . . . . . . . . 36
3.2.4 Supervised Clustering . . . . . . . . . . . . . . . . . . 39
4 Co-regularized Least Squares Regression 43
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Eﬃcient Co-Regression . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 Non-Parametric Least Squares Regression . . . . . . . 46
4.2.2 Semi-parametric Approximation . . . . . . . . . . . . . 48
4.2.3 Relation to RLSR . . . . . . . . . . . . . . . . . . . . . 50
4.3 Distributed CoRLSR . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Block Coordinate Descent CoRLSR . . . . . . . . . . . 51
4.3.2 Analysis of Distributed CoRLSR . . . . . . . . . . . . 52
v4.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 UCI Experiments . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Results for KDD Cup 2004 data set . . . . . . . . . . . 55
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Co-perceptrons 59
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Generalized Perceptrons . . . . . . . . . . . . . . . . . . . . . 61
5.3 Co-perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.1 Biocreative Data Set . . . . . . . . . . . . . . . . . . . 65
5.4.2 Spanish News Wire . . . . . . . . . . . . . . . . . . . . 67
5.4.3 Execution Time . . . . . . . . . . . . . . . . . . . . . . 68
5.4.4 Feature splits . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Co-support Vector Learning 71
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 SVMs for Structured Output Variables . . . . . . . . . . . . . 72
6.3 Co-support Vector Machines . . . . . . . . . . . . . . . . . . . 76
6.4 Optimization Strategy . . . . . . . . . . . . . . . . . . . . . . 81
6.5 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.5.1 Multi-Class Classiﬁcation . . . . . . . . . . . . . . . . 83
6.5.2 Named Entity Recognition . . . . . . . . . . . . . . . . 84
6.5.3 Natural Language Parsing . . . . . . . . . . . . . . . . 85
6.5.4 Execution Time . . . . . . . . . . . . . . . . . . . . . . 87
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7 Transductive Support Vector Machines 89
7.1 Unconstraint Optimization . . . . . . . . . . . . . . . . . . . . 90
7.2ined Transductive SVMs . . . . . . . . . . . . . . . 93
7.3 Unconstraint CoSVM Optimization . . . . . . . . . . . . . . . 98
7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.4.1 Execution Time . . . . . . . . . . . . . . . . . . . . . . 100
7.4.2 Multi-class Classiﬁcation . . . . . . . . . . . . . . . . . 101
7.4.3 Artiﬁcial Sequential Data . . . . . . . . . . . . . . . . 102
7.4.4 Named Entity Recognition . . . . . . . . . . . . . . . . 104
7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.6 Comparison with CoSVMs . . . . . . . . . . . . . . . . . . . . 105
7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
vi8 Supervised Clustering of Streaming Data 109
8.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.2 Learning to Cluster . . . . . . . . . . . . . . . . . . . . . . . . 112
8.3 Clustering of Streaming Data . . . . . . . . . . . . . . . . . . 115
8.4 Experimental Results . . . . . . . . . .