22
pages

- variable names
- dominance constraints
- annotate variable
- let ?
- problem
- labeled variables
- normal dominance
- identify normal
- algorithm back
- graph algorithm

Voir plus
Voir moins

Vous aimerez aussi

FOR DOMINANCE CONSTRAINTS

y z x yERNST ALTHAUS , DENYS DUCHIER , ALEXANDER KOLLER , KURT MEHLHORN ,

z yJOACHIM NIEHREN , AND SVEN THIEL

Abstract. Dominance constraints are logical descriptions of trees that are widely used in

computational linguistics. Their general satis abilit y problem is known to be NP-complete. Here

we identify normal dominance constraints and present an e cien t graph algorithm for testing their

satis ablit y in deterministic polynomial time. Previously, no polynomial time algorithm was known.

1. Introduction. The dominance relation of a tree is the ancestor relation be-

tween its nodes. Dominance constraints are logical descriptions of trees talking about

the dominance relation.

Dominance based tree descriptions were rst used in automata theory in the six-

ties [TW67], rediscovered in computational linguistics in the early eighties [MHF83],

and investigated from a logical point of view in the early nineties [BRVS95]. Since

then, they have found numerous applications in computational linguistics: they have

been used for grammar formalisms [VS92, RVSW95, DT99, Per00], in natural lan-

guage semantics [Mus95, ENRX98], and for discourse analysis [GW98].

The two most important computational tasks for dominance constraints are sat-

is ability testing { does the constraint describe a tree? { and enumerating solu-

tions, i.e. the described trees. But as shown recently [KNT01], testing satis abilit y

is an NP-complete problem. Earlier attempts at processing dominance constraints

[Cor94, VSWR95, DN00] all su er from this fact. This has shed doubt on their

practical usefulness.

In this article, we identify normal dominance constraints, a natural subclass of

dominance constraints whose restrictions should be unproblematic for many applica-

tions. We present an e cien t graph algorithm that decides satis abilit y of normal

dominance constraints in deterministic polynomial time. Previously, no polynomial

time algorithm was known.

We derive the graph algorithm for testing satis abilit y as follows. First, we in-

troduce dominance graphs and de ne their con guration problem (investigated in

+[ADK 01]). Second, we show that the con gurabilit y of dominance graphs is lin-

ear time equivalent to the satis abilit y of normal dominance constraints ( rst shown

in [KMN00]). Third, we characterize the y of dominance graphs as the

absence of certain cycles, which we nally test for by reduction to a matching problem.

We also discuss how to use the e cien t satis abilit y test to enumerate solutions.

We apply a choice rule exhaustively while checking for satis abilit y after each step.

Both procedures have been implemented in C++ using the LEDA library [MN99] and

applied to scope ambiguities in natural language semantics in the CLLS framework

[ENRX98, EKN01, EKN02].

+This article joins the results from two conference publications [ADK 01] and [KMN00]. The

authors are partially supported by the IST Programme of the EU under contract number IST-

1999-14186 (ALCOM-FT), and by the Collaborative Research Centre (SFB) 378 of the Deutsche

Forschungsgemeinschaft.

yMax-Planck-Institute for Computer Science, Saarbruc ken, Germany

zProgramming Systems Lab, Fachbereich Informatik, Universit at des Saarlandes, Saarbruc ken,

Germany

xDepartment of Computational Linguistics, Universit at des Saarlandes, Saarbruc ken, Germany

18x 9 y 2

! ^

linguist lang

x y

speak

x y

Fig. 2.1. A simple dominance constraint.

8x 9 y 2

! ^

linguist 9 y lang 8x 2

x ^ y !

lang speak linguist speak

x y x y

y x

Fig. 2.2. Readings represented by the constraint in Fig. 2.1.

To complement our results, we nally investigate a close variant of the con gura-

tion problem of dominance graphs where closed leaves are permitted in addition. This

variant is more general but also relevant for applications in computational linguistics

[Bos96, CFS97]. We show that con gurabilit y is already NP-complete for the more

general dominance graphs. Nevertheless, the presented algorithms can still help to

solve this alternative problem more e cien tly.

Plan of the paper. The rst part of the paper introduces dominance constraints:

We motivate using them in computational linguistics in Section 2; then we de ne

them in Section 3, discuss their satis abilit y problem, and introduce the concepts of

normal dominance constraints and of solved forms. In the second part of the pa-

per, we turn to a discussion of dominance graphs. We de ne them and relate them

to normal dominance constraints in Section 4. Section 5 presents a basic algorithm

for enumerating the solved forms of a dominance graph. Then we derive the above-

mentioned characterization of con gurabilit y in Section 6, show how to test for this

property e cien tly in Section 7, and plug this e cien t algorithm into the enumera-

tion algorithm in Section 8. In the nal part of the article, we apply this e cien t

enumeration algorithm back to normal dominance constraints and discuss an imple-

mentation (Section 9), and prove that the more general con gurabilit y with closed

leaves is NP-complete (Section 10). Section 11 concludes and discusses further work.

2. Motivation. As one of the many applications of dominance constraints in

computational linguistics, we will give a brief introduction to scope underspeci c ation

[EKN01, AC92, Rey93, Bos96].

This application is concerned with coping with ambiguous sentences such as the

following:

(2.1) Every linguist speaks two languages.

Sentence (2.1) is ambiguous because it has two di eren t possible meanings, indi-

cated by the continuations

(2.2) . . .namely, English and Chinese.

2say

john

9u 8w 9x 9y 8z

! ! ^ ^ !

comp ^ ^ ^ prod

dept repr spl

u z

w x y

in of see of

w u x w x y y z

Fig. 2.3. A dominance constraint describing the meaning of (2.6).

(2.3) . . .but not necessarily the same ones.

In the rst reading, each linguist must speak the same two languages. In the second, no

two linguists necessarily speak a common language, but each speaks at least two. We

can represent the two possible meanings logically as the following rst-order formulas,

which can be represented as trees as in Fig. 2.2.

(2.4) 8x:(linguist(x)!9 y:lang(y)^ speak(x; y))2

(2.5) 9 y:lang(y)^8x:(linguist(x)! speak(x; y))2

Ambiguity is a real problem to language processing because the number of read-

ings of a sentence grows quickly with the number of \quanti ers" such as \every

linguist" and \two languages", and interacts with other sources of ambiguity besides.

5The sentence (2.6) has already 56 readings, and larger examples are easy to construct.

(2.6) John says that some representative of every department in a company saw a

sample of each product.

The key observation to scope underspeci cation is that the di erences between

the readings are very systematic; all contain the same \semantic material" (e.g. repre-

sentations of the constituents \every linguist", \two languages", and \speak"), which

is only combined in di eren t ways. The constraints on these combinations can be

speci ed using dominance constraints.

An example is Fig. 2.1. This constraint graph is a description of the two readings

of (2.1), shown in Fig. 2.2; it can be seen as a graphical representation of a dominance

constraint. Similarly, the 56 readings of (2.6) can beted by the graph in

Fig. 2.3. In the paper, we will use constraint graphs to link the (logic) work on

dominance constraints to graph algorithms.

Pictures as in Fig. 2.3 are being drawn in most modern approaches to scope

underspeci cation. However, they are not always interpreted as dominance constraints

[Rey93, Bos96]. The subtle di erence in meaning has the surprising e ect of making

these other approaches NP-complete even when the graphs fall into the class where

dominance constraints have polynomial satis abilit y. We will show this in Section 10.

3. Satis abilit y of Dominance Constraints. In this section, we de ne the

the language of dominance constraints and recall known results on satis abilit y. The

variant of dominance constraints we employ describes constructor trees { ground

5The following sentence from [Hob83], which is interesting both in form and in content, has around

200 readings: \Many people feel that most sentences exhibit too few quanti er scope ambiguities for

much e ort to be devoted to this problem, but a casual inspection of several sentences from any text

should convince almost everyone otherwise."

3terms over a signature of function symbols { rather than feature trees as considered

in [Smo92, BRVS95, ST94, MNP00].

3.1. Trees and Constructor Trees. We assume a nite or in nite signature

with function symbols f; g; : : :, each of which is equipped with an arity ar(f) 0.

Constants are function symbols of arity 0 denoted by a; b. We assume that contains

at least one constant and one symbol of arity at least 2.

A constructor tree can be de ned either as a term or equivalently on the basis

of directed graphs. The ground term f(g(a; a)), for instance, corresponds to the

directed graph in Figure 3.1. Throughout this article, we will employ the graph based

de nition.

An (unlabeled) tree is a forest with exactly one root. f

A forest is a nite directed graph (V; E) where V is a

g nite set of nodes denoted by u; v; w, and E V V

a set of edges such that the indegree of each node is at a a

most 1 and there is no cycle. Each forest has at least one

Fig. 3.1. f(g(a; a))

root, i.e. a node with indegree 0. We call the nodes with

outdegree 0 the leaves of the forest.

A ( nite) constructor tree is a triple (V; E; L) consisting of a tree (V; E) and a

labeling function L : E[V ! [N s.t. L(E)N (edge labels) and L(V ) (node

labels). The edge labels in a constructor tree determine the order of the children of

a node: for each node u 2 V and each natural number 1 k ar(L(u)), there is

exactly one edge (u; v)2 E with L((u; v)) = k.

We draw constructor trees as in Figure 3.1, by annotating nodes with their labels

and ordering the edges such that their labels increase from left to right.

3.2. Constraint Language. The language of dominance constraints is a logical

language that is interpreted over the class of tree structures. Tree structures are

rst-order model structures which specify certain relations between the nodes of a

construtor tree.

Let = (V; E; L) be a constructor tree with nodes u; v; v ; : : : v 2 V . The1 n

dominance relation uC v holds in i there is a path from u to v; the labeling relation

u:f(v ; : : :; v ) holds in i u is labeled by the n-ary symbol f and has the children1 n

v ; : : :; v in this order; that is, L(u) = f, ar(f) = n, f(u; v ); : : :; (u; v )g E, and1 n 1 n

L((u; v )) = i for all 1 i n.i

Definition 3.1 (Tree Structure). The tree structure of a constructor tree with

node set V is a rst-or der structure with domain V which provides the dominance

relationC of and the labeling relation of for each function symbol f 2 .

Let Vars be an in nite set of (node) variables X; Y; Z; : : : A dominance constraint

’ is a conjunction of dominance, inequality, and labeling literals of the following form

where ar(f) = n:

0 ’ ::= ’^ ’ j XC Y j X=Y j X:f(X ; : : :; X )1 n

We freely identify a constraint with the set of its literals. Let Var(’) be the set of

variables of ’. A pair of a tree structure with node set V and a variable assignment

: Var(’)! V satis es ’ i it satis es all its literal in the obvious way. We say that

(; ) is a solution of ’ in this case; ’ is satis able if it has a solution.

For instance, the following constraint that happens to be unsatis able:

X:f(X ; X )^ X C Y ^ X C Y:1 2 1 2

4

6It requires that node values of X and X are sisters that are both ancestors of1 2

the node value of Y . This is clearly impossible in a tree, since trees cannot branch

upwards.

3.3. Constraint Graphs. We usually draw a dominance constraint as a con-

straint graph. For instance, the unsatis able constraint from above is drawn in Figure

3.2. It illustrates clearly that the constraint requires upward branching and thus

cannot be satis ed by any tree.

The nodes of a constraint graph are the

f X

variables of the corresponding constraint. La-

X Xbeling constraints relate to solid edges called 1 2

tree edges. Dominance constraints are drawn Y

as dashed lines called dominance edges. As for

trees, we annotate labels to nodes of the graph Fig. 3.2. The unsatis able constraint

X:f(X ; X )^ X C Y ^X C Y1 2 1 2and order tree edges from left to right. Note

that we ignore inequalities in constraint graphs.

We sometimes annotate variable names to the graph nodes. This is not always

necessary since all occurences of the same variable are always represented by a single

node in a constraint graph. We may thus freely omit variable names. In the motivating

example (Figure 2.3), for instance, we have omitted all variable names.

Constraint graphs motivate the following notions to talk about constraints. We

call a variable X labeled in a constraint ’ if there exists a literal X:f(: : :) in ’. A

(solid) fragment of constraint ’ is a maximal set of variables in ’ that are pairwise

connected by labeling literals. A variable X is called a root of a fragment in ’ if it

does not occur in child position of a labeling literal in ’, i.e. if there is no Z such that

Z:f(: : :X : : :) belongs to ’. A hole of a fragment is a variable in ’ that is unlabeled

in ’. A leaf of a fragment is either a hole or a variable labeled by a constant, i.e. a

variable X with X:a in ’.

3.4. Satis abilit y. We are interested in two natural problems concerning dom-

inance constraints that are both motivated by our application: rst of all we would

like to test satis ablit y, and second, we would like to enumerate all solutions of a

satis able dominance constraint.

The complexity of the satis abilit y problem of dominance constraints was inves-

tigated in [KNT01] and shed doubts on their usefulness.

Theorem 3.2. Satis ablitiy of dominance constraints is NP-complete.

Deciding satis abilit y in non-deterministic polynomial time is quite simple: In a rst

step one guesses whether XC Y or :XC Y for each two variables X; Y in a given

constraint. In a second step, one tests the consistency of these relationships.

The NP-hardness proof relies on the fact that solid fragments of a constraint

graph may overlap in a solution. This means that distinct labeled variables may be

assigned to the same node of a tree.

For illustration, consider the constraint X:f(X ; X ) ^1 2

Y fY :f(Y ; Y ) ^ YC X ^ XC Y whose graph is shown in1 2 1 X

fFigure 3.3. Every solution must map X to the same node

Y Y1 2as either Y or Y . We say that X overlaps with Y or Y in1 1

X X

1 2a solution of this constraint.

We call an overlap proper if it involves two labeled vari-

Fig. 3.3. Overlap

ables. In the applications in computational linguistics, we

typically do not want proper overlap (but may accept overlaps of roots with holes).

5The subclass of dominance constraints that excludes proper overlap (and xes some

minor inconveniences) is the class of normal dominance constraints.

3.5. Normal Dominance Constraints. We next distinguish a fragment of

dominance constraints which we will show to have a polynomial time satis abilit y

problem.

Definition 3.3 (Normal dominance constraint). A dominance constraint ’ is

called normal i for all variables X; Y 2 Var(’):

1. There is no proper overlap in solutions of ’: X=Y in ’ if X and Y are

distinct variables that are labeled in ’.

2. Solid fragments are tree shaped or cyclic: every variable in ’ appears at most

once as a parent and at most once as a child in a labeling literal of ’.

3. Dominance edges go from holes to roots: if XC Y in ’ then X is unlabeled

in ’ whereas Y is labeled but does not occur in child position in ’.

4. There are no empty fragments: every hole of ’ occurs in some child position.

Conditions 1 and 4 say that only roots and holes have the permission to overlap

in a solution of a normal constraint. Distinct holes cannot overlap since they must

have distinct parents, which are labeled variables that cannot overlap. For a similar

reason, it is impossible that a hole overlaps with a labeled node that has a parent.

Condition 2 requires acyclic fragments to be tree shaped. This excludes many

constraints, as for instance X:f(Y; Y ), X:f(Y ; Y )^X:f(Z ; Z ) or Y :f(X)^Z:f(X).1 2 1 2

The last two examples are particularly di cult to treat when subsumed by a larger

constraint: they entail equations (Y =Z , Y =Z , respectively Y =Z ) whose global1 1 2 2

consequences are di cult to predict. W.l.o.g we can always restrict ourselves to

normal constraint with acyclic fragments. Other constraints are unsatis able anyway.

Condition 3 forbids to express equality through two side dominance: XC Y ^

YC X is not normal since a variable cannot be at the same time a root and a hole.

Condition 3 is also violated by the dominance edge XC Y in the constraint from1

overlap example in Fig. 3.3. It goes from a root to a hole, instead vice versa.

In the following theorem we state the main result of this article, which will follow

from the results presented in the succeeding sections.

Theorem 3.4. Satis ability of normal dominance constraints can be decided in

deterministic polynomial time.

3.6. Solved Forms. We stated above that we would like to have an algorithm

that enumerates all solutions of a given normal dominance constraint. Interpreting

this proposition literally makes not much sense as the reader already might have

noticed. For instance, we can solve the constraint X:a by all trees that have a node

labeled by a. Indeed, every satis ablet has an in nite number of solutions,

so that we probably do not want to enumerate all of them.

What we want to do is to enumerate all solved forms of a normal dominance

constraint instead of all solutions. The idea behind a solved form is that it should

be similar to a solution but not describe its irrelevant parts. For instance, X:a is a

perfect solved form since all its solutions can be easily read o from this constraint.

We will now de ne an appropriate notion of solved forms. In particular, it should

hold that a normal dominance constraint has a solution if and only if it has a solved

form. Given a constraint ’ we de ne a relation R on the variables of ’ that we call’

the reachability relation of ’. This relation is the transitive closure of the following

relation:

f(X; Y )j X:f(: : : ; Y; : : :)2 ’ or XC Y 2 ’g

6

6We say that Y can be reached from X if (X; Y ) 2 R . In this case, it clearly holds’

that ’ entails the dominance XC Y .

Definition 3.5 (Solved Form). A normal dominance constraint ’ is in solved

form if it satis es the following two properties for all variables X; Y; Z in Var(’):

1. Dominance edges do not branch upwards: if X and Y are distinct then not

both XC Z in ’ and YC Z in ’.

2. The graph of ’ is acyclic: (X; X)62 R .’

In other words, a normal dominance constraint ’ is in solved form if and only if

its constraint graph is a forest.

0A solved form of a normal constraint ’ is a normal constraint ’ that is in solved

form, contains the same labeling literals as ’, and has a stronger reachability relation,

which means R R 0.’ ’

Lemma 3.6. Every normal dominance constraint in solved form has a solution.

Proof. We have to construct a tree solution for a solved form ’. The idea is

that the constraint graph of a solved form is already a forest. It is su cien t to

transform this forest into a tree without dominance edges. This is quite simple given

the transformation illustrated in Figure 3.4. (Note that we assumed to contain a

function symbol f of arity at least two and a constant a.)

In the rst step, we turn the forest into a tree by adding a top most fragment.

Let Y ; : : :; Y be the minimal elements in the reachability order R which exist since1 m ’

R is acyclic. We can then de ne a new solved form ’ with a top most fragment,’ 1

by adding a new fragment to ’ with a single hole with dominance edges towards all

roots Y ; : : :; Y .1 m

In the second step we repeatedly transform dominance edges into tree edges. We

stop, once no dominance edge is left. The idea is illustrated in Figure 3.4. Recall

that we assumed that our signature contains a constant a and a function symbol

f of arity n 2. We will also use a function dist on constraints, which is de nedV

0 0 0 0for all constraints ’ by dist(’ ) = ’ ^ fX = Y j X; Y 2 Var(’ ) distinctg.

Suppose that there still exists a hole X in ’ from where dominance edges start.1

Let Y ; : : :; Y be all the roots of ’ such that the dominance literal XC Y belongs1 m 1 i

to ’ for i = 1; : : :; m. We construct ’ by removing all these literals from ’ and1 2 1

distinguish two cases:

If m > n, we x a fresh variable Z and de ne:

^m

’ = dist(’ ^ X:f(Y ; : : : ; Y ; Z)^ ZC Y )3 2 1 n 1 i

i=n

’ is a solved form that entails ’, and it contains n 1 dominance literals3

less than ’ .1

If m n, we x fresh variables Z ; : : :; Z and de ne:m+1 n

^n

’ = dist(’ ^ X:f(Y ; : : :; Y ; Z ; : : :; Z )^ Z :a)3 1 1 m m+1 n j

j=m+1

’ is a solved form that entails ’, and it contains m dominance literals less3

than ’ .1

By applying the above transformation repeatedly, we obtain a solved form ’ which

entails ’ and contains no dominance literals.

In the third step we can easily satisfy ’ by the constructor tree that corresponds

to ’ itself.

In the construction of a solution of ’ we had to "invent" variables that are not

present ’. Thus the constructor tree in the solution contains nodes that do not

7

6f X

X

f

Y Y1 2

a

Y Y Y Y1 2 3 4 Y Y3 4

Fig. 3.4. Transforming dominance edges into tree edges. Here, 4 dominance edges are trans-

formed while using a function symbol of arity 3.

correspond to variables of ’. In the following lemma we will show that every satis -

able normal constraint has a solved form and that it can essentially be obtained by

removing the invented material.

Lemma 3.7. Every solution of a normal dominance constraint ’ also satis es

some solved form of ’.

Proof. Let (; ) be a solution of ’. In order to construct a solved form, we de ne

a partial function hole on the root variables of ’. Consider a root Y , the function

is de ned if there is a hole X with (X)C (Y ). Since is a tree, there is hole Z

6such that (Z)C (Y ) and (X)C (Z) for all holes X with (X)C (Y ) . We set

hole(Y ) = Z. Let ’ denote the conjunction of the labeling and inequality literals ofl

’, then the following is a solved form of ’:

^

0 ’ = ’ ^ fhole(Y )C Y j Y is a root for which hole is de ned gl

0

0Clearly, ’ is a normal constraint in solved form. We have to show R R .’ ’

Since both constraints have the same labeling literals, it su ces to prove for every

0dominance literal XC Y in ’ that (X; Y )2 R . We will show a stronger statement:’

if X is a hole and Y is a root with (X)C (Y ), then (X; Y )2 R 0.’

We proceed by induction on the length of the path from (X) to (Y ). For

Z = hole(Y ), we have (X)C (Z). If X = Z, the claim holds. Otherwise, let R

denote the root of the fragment of Z. The hole X can only overlap with a root, so

we get (X)C (R). As (R) = (Z), we can apply the induction hypothesis and

obtain (X; R) 2 R 0. Since (R; Z) and (Z; Y ) belong to R 0, the claim follows from’ ’

the transitivity of the reachability relation.

The validity of Lemma 3.7 heavily depends on the f X

absence of proper overlaps (Condition 1 of Def. 3.3).

X X 1 2This is illustrated by the following example:

f Y2^

X:f(X ; X )^Y :f(Y ; Y )^ (XC Z ^YC Z ^Z :a)1 2 1 2 i i i i i Y Y 1 2

i=1

a Z a Z1 2

This constraint satis es all normality conditions except

for the overlap restriction. It also has a solution but no

solved form. The reason for this problem is that X and Y overlap properly in all

solutions of this constraint.

The combination of Lemmas 3.6 and 3.7 yields the following proposition, which

justi es computing with solved forms instead of solutions:

Proposition 3.8. A normal dominance constraint has a solved form if and only

if it is satis able.

6So either (Z) = (Y ) (i.e. Y is plugged into Z) or (Z) is the lowest proper ancestor of (Y )

1for which is de ned (i.e. which is not "invented").

8

6