TAG Parsing as Model Enumeration

pefav - Ralph Debusmann

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

7 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

TAG Parsing as Model Enumeration Ralph Debusmann Programming Systems Lab Saarland University Postfach 15 11 50 66041 Saarbrücken Germany Denys Duchier Équipe Calligramme LORIA – UMR 7503 Campus Scientifique, B. P. 239 54506 Vandœuvre lès Nancy CEDEX France Marco Kuhlmann Programming Systems Lab Saarland University Postfach 15 11 50 66041 Saarbrücken Germany Stefan Thater Computational Linguistics Saarland University Postfach 15 11 50 66041 Saarbrücken Germany Abstract This paper introduces well-ordered derivation trees and makes use of this concept in a novel axiomatization of the TAG parsing problem as a constraint satisfaction problem. Contrary to prior approaches, our axiomatization focuses on the derivation trees rather than the derived trees. Well-ordered derivation trees are our pri- mary models, whereas the derived trees serve solely to determine word order. 1 Introduction Tree Adjoining Grammar (TAG) relates strings with two kinds of structures: derivation trees and corresponding derived trees. Derivation trees are more informative than their corresponding derived trees in the sense that the derived trees can be reconstructed from them. However, derivation trees are usually interpreted as unordered trees; they then cannot be used to formulate the TAG parsing problem directly, as they do not encode word order infor- mation.

tween tag

tag parsing

exact relation

anchors

trees

pi1 ?

Sujets

Kuhlmann

Anchor

Informations

Publié par	pefav
Nombre de lectures	21
Langue	English

Extrait

TAG Parsing as Model Enumeration
Ralph Debusmann Denys Duchier Marco Kuhlmann Stefan Thater
Programming Systems Lab Équipe Calligramme Programming Systems Lab Computational Linguistics
Saarland University LORIA – UMR 7503 Saarland University Saarland University
Postfach 15 11 50 Campus Scientiﬁque, B. P. 239 Postfach 15 11 50 Postfach 15 11 50
66041 Saarbrücken 54506 Vandœuvre lès Nancy CEDEX 66041 Saarbrücken 66041 Saarbrücken
Germany France Germany Germany
rade@ps.uni-sb.de duchier@loria.fr kuhlmann@ps.uni-sb.de stth@coli.uni-sb.de
Abstract lem: given a description (a logical formula) of the input
string, enumerate all and only those well-ordered deriva-
This paper introduces well-ordered derivation tion trees that are licensed by . Based on earlier work by
trees and makes use of this concept in a novel Koller and Striegnitz (2002), we show that the solutions
axiomatization of the TAG parsing problem as to this problem can in turn be characterised as the solu-
a constraint satisfaction problem. Contrary to tions of a constraint-satisfaction problem (CSP) on ﬁnite
prior approaches, our axiomatization focuses set integer variables, which can be solved by state-of-the-
on the derivation trees rather than the derived art constraint technology.
trees. Well-ordered derivation trees are our pri-
Our approach offers at least two interesting perspec-
mary models, whereas the derived trees serve
tives. First, it enables the encoding of LTAG grammars as
solely to determine word order.
certain dependency grammars, thereby illuminating the
exact relation between the two formalisms. Second, the
1 Introduction formulation of the LTAG parsing problem as a CSP opens
up a large quantity of existing data to evaluate the con-
Tree Adjoining Grammar (TAG) relates strings with two
straint-based approach to parsing more thoroughly than
kinds of structures: derivation trees and corresponding
what could be done before.
derived trees. Derivation trees are more informative than
their corresponding derived trees in the sense that the
Plan of the paper. In §2, we show how the relation be-
derived trees can be reconstructed from them. However,
tween TAG derivation trees and elementary trees can be
derivation trees are usually interpreted as unordered trees;
formulated as the relation between models and logical
they then cannot be used to formulate the TAG parsing descriptions, and introduce the notion of a TAG satisﬁa-
problem directly, as they do not encode word order infor- bility problem. In §3, we extend satisﬁability problems
mation. to parsing problems; we formalize the notion of well-or-
This paper suggests to interpret derivation trees as dered derivation trees as the structures under investiga-
ordered trees. It introduces the notion of well-ordered tion in these problems, and show how their solutions can
derivation trees: A derivation tree is called well-ordered be obtained by solving a CSP. We illustrate this approach
if its nodes stand in the same precedence relation as
by means of an example in §4. §5 discusses the two per-
the anchors in the corresponding derived tree. Because
spectives of our approach mentioned above. Finally, §6
TAG can generate non-context-free languages, well-or-
concludes the paper and presents ideas for future work.
dered derivation trees can be non-projective, i.e., they can
contain “crossing” edges. The main contribution of this
2 TAG Satisﬁability Problem
paper is an axiomatization of the exact form of non-pro-
jectivity licensed by TAG operations. It thereby provides a There are two major and constrasting formal perspectives
novel model-theoretic interpretation of the LTAG parsing on parsing: proof-theoretic and model-theoretic. The for-
problem. mer emphasizes the construction of logical derivations,
The axiomatization of well-ordered derivation trees is while the latter more directly states how models sat-
put into practice in a description-based approach to TAG isfy descriptions. The model-theoretic perspective (Cor-
parsing, in which the parsing problem of strongly lexical- nell and Rogers, 1998) applied to a description-based
1ized TAGs is interpreted as a model enumeration prob- speciﬁcation of parsing problems is often more readily
1 amenable to constraint-based processing. This is the viewA TAG is called strongly lexicalized, if each of its elementary
trees contains exactly one anchor. which we adopt for the remainder of this paper.
ffAs a ﬁrst step in our description-based approach to TAG The notion of TAG satisﬁability problem as outlined
parsing, we formulate the TAG satisﬁability problem: above is implicit in (Koller and Striegnitz, 2002), who
formulate the surface realization problem of natural lan-Given a multiset of elementary trees, can they be com-
guage generation as the conﬁguration problem of (un-bined using operations of adjunction and substitution to
ordered) dependency trees. A natural question is whetherform a complete derived tree?
this treatment can be extended to parsing problems.
To formally express TAG satisﬁability problems, we in-
Given our formalization of TAG satisﬁability problems,
troduce the following description language:
parsing problems cannot be expressed directly, as the
models under consideration—derivation trees—are un-′::= w : | ∧ , (1)
ordered trees. In order to express word order, a more natu-
ral class of models are derived trees, as these encode wordwhere w is taken from a set of tree variables, and from
order information in a direct way. However, the problema set of TAG elementary trees of some grammar. We call
in using derived trees is that the formalization of the sat-w : a tree literal. We say that is normal if every tree
isfaction relation becomes non-trivial, as the adjunctionvariable in appears precisely in one tree literal.
operation now requires a more complicated interpretationIt is well-known that the satisﬁability problem is equiv-
of elementary trees—not as atomic entities, but as groupsalent to the existence of a derivation tree, hence the idea
of nodes that may get separated by material being “gluedto use derivation trees as models of normal descrip-
in between” by adjunction. If not conditioned carefully,tions. In order to make this more precise, we need some
this might lead to a formalism that is more expressivedeﬁnitions:
than TAG (Muskens, 2001; Rogers, 2003).Let be the set of ﬁnite paths, i.e. the ﬁnite sequences
We suggest to solve the problem by considering deriva-of positive integers. For simplicity in the following, we
tion trees as being ordered. In the next section, we willshall identify a node of a tree with the path ∈ that
introduce the notion of well-ordered derivation trees,leads to it starting from the root of said tree.
which are possibly non-projective, ordered derivationA derivation tree is a tree(V,E) formed from vertices
trees whose precedence relation agrees with the prece-V and labeled edges E⊆ V×V× . A model of a normal
dence relation in the corresponding derived tree. This al-description w : ∧...∧w : is a derivation tree where1 1 n n
2 lows for an extension of the description language from (1)V ={w ,...,w } and such that the following conditions1 n
with precedence literals, which can be interpreted onare satisﬁed, where we write w − → w for an edge1 2
(well-ordered) derivation trees in a straightforward way.labeled with path and representing either an adjunction
or a substitution of at node of :2 1
3 Well-ordered Derivation Trees
• If w is the root of the tree, then must be an initialk k
Our ambition is to tackle the parsing problem wheretree.
word-order is part of the input speciﬁcation. To this end,
• For each w , its outgoing edges must all be labeledi we formulate the TAG parsing problem analogously to our
with distinct paths, and for each substitution (resp. earlier deﬁnition of the TAG satisﬁability problem:
adjunction) node in there must be exactly (resp.i
Given an ordered multiset of elementary trees, can they3at most) one -labeled edge.
be combined using operations of adjunction and substitu-
tion to form a complete derived tree where the respective• For each edge w − → w , if is a substitution1 2
anchors are realized in the same order as stipulated by(resp. adjunction) node in , then must be an ini-1 2
the input speciﬁcation?tial (resp. auxiliary) tree.
To formally express the parsing problem, we extendIn order to model lexical ambiguity, the description
′our description language with precedence literals w≺ w :language can be extended with a limited form of disjunc-
tion, using for example the following extended language: ′ ′::= w : | w≺ w | ∧ (2)
k k ′::= w :{ ,..., } | ∧ ,k 1 nk ′ ′where w≺ w means that w’s anchor must precede w ’s.
For the same reasons as before, the approach that we willwhere the set is to be interpreted disjunctively.
develop for this language will trivially extend to one with
2Thus setting the interpretation of tree variables to be the iden- lexical ambiguity.
tity substantially simpliﬁes the presentation.