Treebank refinement [Elektronische Ressource] : optimising representations of syntactic analyses for probabilistic context-free parsing / von Tylman Ule
292 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Treebank refinement [Elektronische Ressource] : optimising representations of syntactic analyses for probabilistic context-free parsing / von Tylman Ule

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
292 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

§Treebank RefinementOptimising Representations of Syntactic Analysesfor Probabilistic Context-Free ParsingvonTylman UlePhilosophische Dissertationangenommen von der Neuphilologischen Fakultätder Universität Tübingenam 12. Juni 2006München2007Gedruckt mit Genehmigung der Neuphilologischen Fakultätder Universität TübingenHauptberichterstatter: Prof. Dr. Erhard HinrichsMitberichterstatter: Prof. Dr. Uwe MönnichDekan: Prof. Dr. Joachim KnapeAcknowledgementsI am thankful to all the people that have supported me during the years thatI spent working on this thesis.The one person who was most directly involved in stopping me fromdiverging any longer from the main goal of finishing this thesis was JornVeenstra. He was critical about my ideas and ever ready to comment onthem and to discuss the questions that I often had. But he also helped me tounderstandthatatsomepoint,yousimplyhave tostopproducingnewideas,and that you have to restrict yourself to writing down the most interestingconsequences of the best ideas that you have come up with so far. A lengthytalk in his own walls abroad laid the basis for this thesis. I did not change itmuch lateron.Frank Müller helped me to start writing papers at all. He convinced methat our work was interesting enough to be published, and the reviewersagreed. Frank and I and our Espresso maker spent many years together, andI will miss his Cappuccino.

Sujets

Informations

Publié par
Publié le 01 janvier 2008
Nombre de lectures 12
Langue English
Poids de l'ouvrage 3 Mo

Extrait

Treebank Refinement
Optimising Representations of Syntactic Analyses
for Probabilistic Context-Free Parsing
von
Tylman Ule
Philosophische Dissertation
angenommen von der Neuphilologischen Fakultät
der Universität Tübingen
am 12. Juni 2006
München
2007
§Gedruckt mit Genehmigung der Neuphilologischen Fakultät
der Universität Tübingen
Hauptberichterstatter: Prof. Dr. Erhard Hinrichs
Mitberichterstatter: Prof. Dr. Uwe Mönnich
Dekan: Prof. Dr. Joachim KnapeAcknowledgements
I am thankful to all the people that have supported me during the years that
I spent working on this thesis.
The one person who was most directly involved in stopping me from
diverging any longer from the main goal of finishing this thesis was Jorn
Veenstra. He was critical about my ideas and ever ready to comment on
them and to discuss the questions that I often had. But he also helped me to
understandthatatsomepoint,yousimplyhave tostopproducingnewideas,
and that you have to restrict yourself to writing down the most interesting
consequences of the best ideas that you have come up with so far. A lengthy
talk in his own walls abroad laid the basis for this thesis. I did not change it
much lateron.
Frank Müller helped me to start writing papers at all. He convinced me
that our work was interesting enough to be published, and the reviewers
agreed. Frank and I and our Espresso maker spent many years together, and
I will miss his Cappuccino.
All the time I have been working in Tübingen, my supervisor Erhard
Hinrichs took the time to listen to my ideas. He supported my plans for the
outlineofmythesis andgavemeenoughfreedomtoimplement them. Hewas
always friendly and supportive to me and expanded my knowledge of central
topics in numerous courses.
A thesis should have a central idea that connects all of its parts. Kiril
Simov helped me to find it when he invited me to his laboratory in Sofia.
Working and being with him has always been very pleasant, but this time
I was especially lucky, because when reading in his library, I discovered a
paper that would give the central theme to my thesis: Grammar Refinement.
Going on, I was encouraged by most interesting discussions with Detlef
Prescher and Helmut Schmid to stay on my track. Helmut offered me to
modify the source code of his parsers, which made some of the experiments
reported here possible in the first place.
I have been part of the SFB in Tübingen, and felt encouraged by my
colleagues there. Holger Wunsch commented my plans quietly but precisely
vvi Acknowledgements
and helped me keep up my spirit. I was amazed that Heike Telljohann was
always kind and willing to explain the intricacies of TüBa-D/Z to me. A big
thank you also to the members of my thesis committee, Fritz Hamm, Uwe
Mönnich, and Frank Richter.
Before I joined the SFB, I worked in a project with Wolfgang Lezius and
Esther König, who helped me believe that we are doing a good job despite
of some problems. Without Lothar Lemnitzer I probably would have done
something completely different, because he drew me to Tübingen and has
been pleasant to work with all the time that I know him. Jochen Saile never
let me down when I had another technical problem.
Thanks also goes to the student staff that helped me build my data and
programs, including Maureen Dunne, Aisling Fleming, Steffen Frömel, Kat-
rina Keogh, Wolfgang Maier, Nicole Maruschka and Sigrid Schmitt.
I found refuge for writing up my book at Wolf Paprotté’s Arbeitsbereich
Linguistik in Münster – Thank you! The support I had from his staff was
tremendous, including Hendrik Cyrus, Lea Cyrus, Robert Memering and Jo-
hannes Schwall. They did not mind to delve into whatever topic I had to
write down. It was a nice year, thanks!
Last, but not least comes my family. My mother seems to believe in me
even if I am not sure myself that I am doing the right thing. She helps me to
question my ways, and to calmly solve all problems ahead. My grandmother
has always shown me that it’s fun to learn and ask, and that you can under-
stand about everything. And what is more, she let me explain her so many
things that I became confident I had understood them. There is much more
family that helped me complete this thesis, because I could always rely on
them. Thanks to you, Astrid, Christof, Dominique and Joëlle. Britta and
Jan, I am not sure whether you realise how much you supported me with
your visits and invitations. And your sense for life, even outside University.
Nadja, I am so happy that we have gone through this together!
You took me through these years – Thanks!meiner GroßmutterContents
1 Introduction 1
I Parsing and Parser Evaluation 5
2 Mostly Context-Free Probabilistic Parsing 7
2.1 Context-Free Grammars . . . . . . . . . . . . . . . . . . . . . 7
2.2 Probabilistic Context-Free Grammars . . . . . . . . . . . . . . 9
2.3 Parsing with Context-Free Grammars . . . . . . . . . . . . . . 11
2.4 Parsing with Probabilistic Context-Free Grammars . . . . . . 12
2.5 Efficient Viterbi Parsing . . . . . . . . . . . . . . . . . . . . . 14
2.6 Alternative Approaches to Probabilistic Parsing . . . . . . . . 17
3 Parser Evaluation 21
3.1 Parseval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Relational Evaluation . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Relative Differences in Information . . . . . . . . . . . . . . . 30
II Representations of Syntactic Analyses 35
4 A Treebank’s Choice 37
4.1 Observations: Challenges for Proper Trees . . . . . . . . . . . 44
4.1.1 The Objects of Syntactic Annotation . . . . . . . . . . 44
4.1.2 Unattached Elements . . . . . . . . . . . . . . . . . . . 46
4.1.3 Free Constituent Order . . . . . . . . . . . . . . . . . . 47
4.1.4 Coordination . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.5 Named Entities . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Representations of Relations . . . . . . . . . . . . . . . . . . . 65
4.2.1 Class, Sequence, Domination . . . . . . . . . . . . . . . 66
4.2.2 Arbitrary Relations and Parallel Structures . . . . . . . 67
ixx Contents
4.3 Encoding the Analyses . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 Proper Trees . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.2 Crossing Edges, Co-Indexation, and Secondary Edges . 69
4.3.3 The Semantics of a Small Label Set . . . . . . . . . . . 71
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5 PCFG Treebank Grammars 75
5.1 Syntactic Bias of a PCFG Treebank Grammar . . . . . . . . . 76
5.1.1 Bias in TüBa-D/Z and negra . . . . . . . . . . . . . . 78
5.1.2 Extending the Label Set . . . . . . . . . . . . . . . . . 82
5.1.3 Structural and Lexical Preferences . . . . . . . . . . . 85
5.2 PCFG-Parsing of TüBa-D/Z . . . . . . . . . . . . . . . . . . 88
5.2.1 Parentheses and Punctuation . . . . . . . . . . . . . . 90
5.2.2 Secondary Edges and Co-Indexation . . . . . . . . . . 100
5.2.3 Train and Test Regimes . . . . . . . . . . . . . . . . . 100
5.2.4 POS-Tagging and Parsing . . . . . . . . . . . . . . . . 103
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6 Related Research 109
6.1 Algorithmic Transformations . . . . . . . . . . . . . . . . . . . 109
6.2 Tuning a Treebank to a Parser . . . . . . . . . . . . . . . . . . 112
6.3 Parsing German . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
III Treebank Refinement 119
7 Including Local Context into Nonterminals 121
7.1 More Context in Context-Free Grammars . . . . . . . . . . . 122
7.1.1 More Context Required. . . . . . . . . . . . . . . . . . 124
7.1.2 Formal Equivalence of Grammars . . . . . . . . . . . . 132
7.1.3 Experimental Adequacy . . . . . . . . . . . . . . . . . 133
7.2 Context – Focus – Production . . . . . . . . . . . . . . . . . . 135
7.3 Determining Deviant Distributions . . . . . . . . . . . . . . . 140
27.3.1 χ Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . 141
27.3.2 χ Merging Infrequent Classes . . . . . . . . . . . . . . 143
7.3.3 Kolmogorov-Smirnov Goodness-of-Fit . . . . . . . . . . 144
7.3.4 Kullback-Leibler Divergence . . . . . . . . . . . . . . . 146
7.3.5 Skew Divergence . . . . . . . . . . . . . . . . . . . . . 147
7.4 Iteratively Adding Local Context . . . . . . . . . . . . . . . . 151
7.5 Unconditionally Spreading Parent Information . . . . . . . . . 157

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents