Does it have to be trees? : Data-driven dependency parsing with incomplete and noisy training data [Elektronische Ressource] / Kathrin Spreyer. Betreuer: Jonas Kuhn

universitat_potsdam - Kathrin Spreyer

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

166 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Informations

Publié par	universitat_potsdam
Publié le	01 janvier 2012
Nombre de lectures	11
Langue	English
Poids de l'ouvrage	2 Mo

Extrait

Does It Have To Be Trees?
Data-Driven Dependency Parsing with
Incomplete and Noisy Training Data
Dissertation von Kathrin Spreyer
Eingereicht bei der Humanwissenschaftlichen Fakulta¨t
der Universita¨t Potsdam
zur Erlangung des Grades des Doktors der Philosophie
Heidelberg, 11. Oktober 2011

This work is licensed under a Creative Commons License:
Attribution - Noncommercial - Share Alike 3.0 Unported
To view a copy of this license visit
http://creativecommons.org/licenses/by-nc-sa/3.0/

Gutachter:
Prof. Dr. Jonas Kuhn
Prof. Dr. Manfred Stede

Datum der mündlichen Prüfung: 15. Dezember 2011

The research that led to this dissertation was funded by the DFG as part of
the collaborative research center SFB 632.

Published online at the
Institutional Repository of the University of Potsdam:
URL http://opus.kobv.de/ubp/volltexte/2012/5749/
URN urn:nbn:de:kobv:517-opus-57498
http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-57498 iii
Erklarung¨
Ich erklare, dass ich diese Dissertation selbstandig verfasst und keine anderen¨ ¨
als die angegebenen Quellen und Hilfsmittel verwendet habe.
Heidelberg, den 11. Oktober 2011.
Kathrin Spreyeriv
Abstract
We present a novel approach to training data-driven dependency parsers on in-
complete annotations. Our parsers are simple modiﬁcations of two well-known
dependencyparsers, thetransition-basedMaltparserandthegraph-basedMST
parser. While previous work on parsing with incomplete data has typically
couched the task in frameworks of unsupervised or semi-supervised machine
learning, we essentially treat it as a supervised problem. In particular, we pro-
pose what we call agnostic parsers which hide all fragmentation in the training
data from their supervised components.
We present experimental results with training data that was obtained by
means of annotation projection. Annotation projection is a resource-lean tech-
nique which allows us to transfer annotations from one language to another
within a parallel corpus. However, the output tends to be noisy and incomplete
due to cross-lingual non-parallelism and error-prone word alignments. This
makes the projected annotations a suitable test bed for our fragment parsers.
Our results show that (i) dependency parsers trained on large amounts of pro-
jectedannotationsachievehigheraccuracythanthedirectprojections, andthat
(ii) our agnostic fragment parsers perform roughly on a par with the original
parsers which are trained only on strictly ﬁltered, complete trees. Finally, (iii)
when our fragment parsers are trained on artiﬁcially fragmented but otherwise
gold standard dependencies, the performance loss is moderate even with up to
50% of all edges removed.v
Acknowledgments
First of all, I would like to thank my Doktorvater, Jonas Kuhn, for his sup-
port, encouragement and patience – and for putting up with my stubbornness.
He provided the guidance I needed, but at the same time let me pursue things
in my own manner, which I greatly appreciate. Needless to say, by sharing his
ideas, he greatly contributed to this thesis.
Further, I would like to thank my former colleagues Gerlof Bouma, Lilja
Øvrelid, Eleftherios Avramidis, Sina Zarrieß, Wolfgang Seeker for interesting
and helpful discussions. Thanks also to Florian Marienfeld and Georg J¨ahnig
for the eﬀort they put into cleaning up the Europarl corpus.
I am also very grateful to Joakim Nivre for an interesting discussion back
in 2008, which encouraged me to further pursue the idea of using fragmented
parse trees; to Sebastian Pad´o for making his Europarl gold standard publicly
available; and to Yi Zhang for sharing his dependency conversion software for
the German treebank. Moreover, I am indebted to people at the Computa-
tional Linguistics department at Heidelberg for letting me use their computing
resources: Anette Frank, Markus Kirschner and Patrick Simianer.
I would also like to thank the anonymous reviewers at CoNLL 2009, LREC
2010, and COLING 2010 for helpful comments. Contents
1 Introduction 1
1.1 Annotation Projection . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Parsing with Tree Fragments . . . . . . . . . . . . . . . . . . . . 4
1.3 Evaluation of Projection-based Systems . . . . . . . . . . . . . . 5
1.4 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Related Work 11
2.1 Annotation Projection . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Word-based annotation projection . . . . . . . . . . . . . 12
2.1.2 Projection of structured annotations . . . . . . . . . . . . 12
2.2 Dependency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Data-driven dependency parsing . . . . . . . . . . . . . . 15
2.2.2 Weakly supervised approaches . . . . . . . . . . . . . . . 16
2.2.3 Synchronous and multilingual parsing . . . . . . . . . . . 16
2.3 Learning From Fragmented Annotations . . . . . . . . . . . . . . 18
3 Projection of Syntactic Dependencies 21
3.1 Parallel Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Parallel corpora . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Bilingual alignment . . . . . . . . . . . . . . . . . . . . . 23
3.2 Violations of Direct Correspondence . . . . . . . . . . . . . . . . 29
3.3 Projection of Dependency Trees . . . . . . . . . . . . . . . . . . . 32
3.3.1 Strict projection . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Constrained fallback projection . . . . . . . . . . . . . . . 39
3.3.3 Partial correspondence projection . . . . . . . . . . . . . . 41
3.4 Quality of Direct Projections . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Gold standard evaluation (German) . . . . . . . . . . . . 48
3.4.2 Pseudo-evaluation against treebank parsers . . . . . . . . 53
3.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . 55
4 Training Parsers on Fragmented Trees 57
4.1 Background: Data-driven Dependency Parsing . . . . . . . . . . 57
4.1.1 Basic notions of dep parsing . . . . . . . . . . . . 58
4.1.2 Textual representation of dependency graphs . . . . . . . 60
4.2 Background: Transition-Based Parsing with Malt . . . . . . . . . 61
4.2.1 Transition system . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Parsing algorithm . . . . . . . . . . . . . . . . . . . . . . 63
4.2.3 Feature model . . . . . . . . . . . . . . . . . . . . . . . . 64
viiviii CONTENTS
4.3 fMalt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Background: Graph-Based Parsing with MST . . . . . . . . . . . 68
4.4.1 Parsing algorithm . . . . . . . . . . . . . . . . . . . . . . 69
4.4.2 Scoring function . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 fMST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . 75
5 Evaluation Methodology 77
5.1 Evaluation of Treebank Parsers . . . . . . . . . . . . . . . . . . . 78
5.2 Treebanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.1 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.2 Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.3 Learnability experiments. . . . . . . . . . . . . . . . . . . 85
5.4 Variance Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . 90
5.5.1 Labeling schemes . . . . . . . . . . . . . . . . . . . . . . . 91
6 Experiments 93
6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.1 Parser-speciﬁc training parameters . . . . . . . . . . . . . 95
6.2.2 Parameter optimization with manually annotated devel-
opment data . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2.3 Parameter optimization with projected development data 100
6.2.4 Fragment size . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3 Baselines and Upper Bounds . . . . . . . . . . . . . . . . . . . . 104
6.4 Malt and fMalt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.4.1 Malt: parsers with completeness assumptions . . . . . . . 107
6.4.2 fMalt: parsers with fragment awareness . . . . . . . . . . 109
6.5 MST and fMST . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.5.1 MST: parsers with completeness assumptions . . . . . . . 111
6.5.2 fMST: parsers with fragment awareness . . . . . . . . . . 112
6.6 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . 113
7 Error Analysis 115
7.1 Sentence Length . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2 Dependency Length . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3 Dep Type . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3.2 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . .