Compiling a Partition-Based Two-Level Formalism
6 pages
English

Compiling a Partition-Based Two-Level Formalism

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
6 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

a a t a A a a 1 a a College) A B no. a 92313384. a a College) College. a a Compiling Partition-Based Two-Level Formalism Edmund Grimley-Evans* George Anton Kiraz Stephen G. Pulman University of Cambridge University of Cambridge University of Cambridge (St John's (St John's Cointmter Laboratory Computer Laboratory Computer Laboratory Cambridge CB2 3QG, UK Cambridge CB2 3QG, UK Cambridge CB2 3QG, UK and SRI International, Cambridge Edmund. Grimley-Evans@cl. cam. ac. George. Kiraz@cl. cam. ac. uk sgpOcam, sri. com Abstract rages to Koskenniemi's notation. These are de- tailed more fully in (Black et al., 1987, pp. 13-15), This paper describes an algorithm for the and in (Ritchie et al., 1992, pp. 181-9). In brief: compilation of two (or more) level or- (1) Koskennienli rules are not easily interpretable thographic or phonological rule notation (by tile grammarian) locally, for the interpretation into finite state transducers. The no- of 'feasible pairs' depends on other rules in the tation is an alternative to the standard set. (2) There are frequently interactions between one deriving from Koskenniemi's work: rules: whenever the lexieal/surface pair affected it is believed to have some practical de- by rule appears in tile context of another rule scriptive advantages, and is quite widely B, the grammarian must check that its appearance used, but has different interpretation. in rule will not conflict with the requirements of ...

Informations

Publié par
Nombre de lectures 17
Langue English

Extrait

is applied ill parallel at each point in the tSupported by Benefactors' Studentship from St input. John's 454 1;o uk S A c A A en x t;he x an(t ln~ a = |;ire [ 1.994). l a ~"r 2 l n - N l M a n = N ... q N M I E [:or'rH U x x x ... l Tl~e l x X (E,~ X U X a l n a A I A SubA,,~ n L = l~n; a [ a X th('. G a X ) C 1 C A s rule, &lid ~ x a.ud : a ... E~ I ( = a r rN+M) × = [-1 surt'~(:e X c X ,1N+M) = = a form) X 1N> I = a S cl x a ... 6 x {(,% x x x × x al-. = SUBA,,~L M L X x ... 9 x a The partition tbrmalism coImists of two types the set of string-tuples representing possible con- of rules (defined in more detail beh)w) which en- tents of the tapes. proI)er subset of regular force optional or obligatory changes. notion n-relations have the property that they are ex- of well-formedness is defined via the notion of pressible as the Cartesian product of regular 'partition' of sequence of lexical/surface corre- languages, we call such spondences, informally, partition is valid anal- lations 'orthogonal'. (W('. present our detinitions ysis if (i) every element of the t)artition is licensed along tire lines of (Kat)lan and Kay, 1994)). by an optional rule, and (ii) no element of the We use two regulm" ot)erators: Intro and Sub. partition violates an obligatory rule. IntrosL denotes the set of strings in into which We have tbund that this formalism has some elements of may be arbitrarily inserted, and practical adwmtages: (1) The rules are relatively denotes the set of strings in ill which independent each other. (2) Their interpreta- substrings that are in /3 may be replaced by tion is more familiar for linguists: each rule copes strings from A. Both operators map regular lan- with single correspondence: in general you don't guages into regular languages, because they can have to worry about all other rules having to t)e t:haract(!rise(1 by regular relations: over tim (:ompatible with it. (3) Multiple character changes phabct E, Intros. (Idz ({el S))*, art permitted (with some restrictions discussed (Id>] (/3 A))*, wtiere IdL L}, below). (4) category or term associated with the identity relation over L. each rule is requi,'e(t to uni(y with the affected There are two kinds of two-level rules. The con- morpheme, allowing for morI)ho-synta(:tic etfects text restriction, or optional, rules, consist of left to be cleanly described. (5) There simple and context centre c, and right context r. Surface etfMent direct interpreter for tt,e rule forrnalism. coercion, or obligatory, rules require the centre to Tile partition formalism has been implemented be split into lexical and surface c, compolmnts. in the European Commission's ALEP system tbr Detinltion 2.1 N:M context restrietion natural language engineering, distributed to over (CR) rule is triple. (/,,c,r) where l,c,r are 30 sites. Descriptions of EU languages arc 'orthogonal' regular relations of the form :: t)eing develot)e(1. version has also im- ... (: Cl ... (:~ ?' ?'1 ... plemented within SI{.I's Core l,anguage Engine (Carl;er, 1995) and has been used to develot) de- Definition 2.2 N:M surface coercion (SC) scriptions of English, French, Spanish, Polish, rule quadruple (/,c/,c~,r) where and Swedish, and Korean morphology. An N-level ex- are 'orthogonal' regular relations of tile form tension of the formalism has also been developed ... l'n, an(t by (Kiraz, 1994; Kiraz, 1996b) arrd used to are 'orthogonal' regular relations restricting only scribe morphology of Syria(: and other Semitic the lexical and surface tapes, respectively, of the languages, arrd by (Bowden Kiraz, 1995) for ... ~N-} and error dete(',tion in noncon(:atenative strings. This ... ).]~ (W+M. [] 1)m.'tition-l)ased two-level formalism is thus seri- We usually use the following notation tbr rules: ous riwll to the standard Koskcnniemi notation. LLC l,I.;x RI,C ~>[¢-[¢> lIowever, until now, the Koskenniemi notation i,SC has had one clear advantage in that it was clear how compile it into transducers, with all the where consequent gains in etliciency and portability and I,LC (lel't l(,xi,:al corlt,,~t) (~,..., with ability construct lexical transducers LEX (lexical lion. II,S(? (right conl;cxt) (rN+l,..., 1. Sm in t)racticc all the left conl;(;xts start Definition of the Formalism with all the right contexts end with L*, 2.1 Formal Definition we omit wril;ing it and assume it by default. The We use tapes, where tim first tapes are operators are: for CII. s, for S(] rules and ]exical and the remaining are surface, -- for coInposite rules. M. In practi(:e, :: We write Ei prot)osed morphologit:at analysis is an for the alphabet of sylnbols used on tape i, and tuI)le of strings, and rules are intert)reted as :: (Er {el) {c}), so that E* is applying section of this analysis in conl;ext: 455 ~o ,~- 1. 4> *{= ?'i I.i "e ~N 7£ de.-. Cl ?" 1.n~ ix '1"~,. be, 1, ix .s' a') tO =- tO be, ot: re= 1~1 H. i rule. V = B b}, * C - a i E + a < E a ] B - aft;re' pro._ = tit(; B ..., X - B = E n • = B A * C19, - (;' P (1, a - a C x A 1 a = (2 B a e, e P E = (b, B E B R~=) A P,. 1 E A r - (R=>, - a B G b c. * v a ~ V ¢ tit(; = ...(.(I, w a a is E v a x c,., u P1 j = + P A A j a R~=) a E a i a a G X : = P,. i E j d E b i c 0 d a (b, j ¢ E c i * A b O b ¢> l}l~,t~ (n-way concatenation of left con- R3: text, centre, aim right context). Formally: Defiinition 2.3 rule c, r) contextually R1 and R2 illustrate the iterative application of allows (1}, Pc, P,.) iff l, and rules on strings: they sanction the lexical-surface [] strings (VBBB,Vbbb), where the second (B,b) pair serves as the centre of the first application Definition 2.4 An SC rule (l, cl, r) coer- of R2 and as the left context of the second ap- cively disallows (Pt, P~,Pr) iff l, r, plication of the same rule. is an cpenthetic P,. Ecl and [] rule which also demonstrates centres of unequal Definition 2.5 N:M two-level grammar is length. (We assume that rule (l, c,r) ERa, we can make this representa- tuitive for the user when insertion rules are used. tion unique by defining canonical way of convert- For example, the rule (E* (g, g), E~, E~ v, E*) ing each such possible centre into same-length ('change to g') would not disallow string-tuple 6'. simple way of doing this is to string-tuple partitioned as g), (e, c), (u, u)... pad with 0s the right making each string as long assmning some CR rule allows (e, e). as the longest string in C: if (Pl, ...,pn), Earlier versions of the partition fbrmalism could (>0", ...,p,,0*) z*(0, 0) (1) not (in practice) cope with multiple lexical char- actors in SC rules, see (Carter, 1995, §4.1). This However, since we know set of possible is not case here. titions it is U{c ~l,r(l,(-,'r} 1{:,}- we can Th
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents