Lexical issues in machine translation

EUROPEAN-COMMISSION - Directorate-General For Translation European Commission

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

148 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Information technology and telecommunications
Information - Documentation

Sujets

DIES IN
AND NATURAL LANGUAGE PROCESSING
Volume 8
LEXICAL ISSUES
IN MACHINE TRANSLATION
Edited by
Paulo ALBERTO and Paul BENNETT
European Commission Studies in machine translation
and natural language processing
Published by:
Office for Official Publications
of the European Communities Managing editor
Erwin Valentini (CEC), Luxembourg
Editorial board
Doug Arnold
(Department of Language and Linguistics, United Kingdom)
Nicoletta Calzolari
(Istituto di Linguistica Computazionale, Italia)
Frank Van Eynde
(Nationaal Fonds voor Wetenschappelijk Onderzoek, België)
Steven Krauwer
(Rijksuniversiteit Utrecht, Nederland)
Bente Maegaard
(Center for Sprogteknologi, Danmark)
Paul Schmidt
(Institut für Angewandte Informationsforschung, Deutschland)
Luxembourg: Office for Official Publications of the European Communities, 1995
ISSN 1017-6568
© ECSC-EC-EAEC, Brussels · Luxembourg, 1994
Printed in Germany Volume 8
Lexical Issues in Machine Translation
Edited by
Paulo Alberto and Paul Bennett
European Commission Volume 8
Lexical Issues in Machine Translation
Editors
Paulo Alberto
Paul Bennett
CONTENTS
PAULO ALBERTO AND PAUL BENNETT
Introduction 7
DONNCHA Ó CRÓINÍN AND GEORGE TALBOT
EIRETERM, EUROTRA's Terminological Database: History and Development.. 11
LIEVE DE WACHTER
The Computational Interpretation of Germanic Noun-Noun Compounds: an
Overview of Possibilities and Hypotheses 19
KERRY MAXWELL
Automatic Translation of English Compounds: Problems and Prospects 37
MARTA CARULLA
Relational Adjectives: their Characteristics and Correspondences 57
OLGA ALEJANDRO AND FLORA RAMÍREZ
Lexical Semantics and the Problem of Multiple Reference in some Polysemie and
Non-Polysemic Nouns 67
ARCHIBALD MICHIELS
Introducing HORATIO 7
ARCHIBALD MICHIELS
Feeding LDOCE Entries into HORATIO 93
ANNA BRAASCH
How Far Do Printed Dictionaries and MT-Lexicons Share Information? 117
BIBLIOGRAPHY 13
CONTRIBUTORS9 PAULO ALBERTO AND PAUL BENNETT
Introduction
This volume contains papers presented to the Working Group on Lexical Issues at the
EUROTRA1 Annual Workshop held in St. Maximin, Provence, in September 1991.2
Together they form a substantial and varied body of work on a number of questions
related to the lexicon and its role in a machine translation (MT) system. It is the
purpose of this introduction to set the papers in context by surveying these various
'lexical issues'.
1 The Background
It seems fair to say that in both theoretical and computational linguistics the time
of the lexicon has arrived. From being a dustbin which contained only idiosyncratic
facts about lexical items (while the grammar alone was the domain of generaliza
tions), the lexicon has become central to research, as its role has expanded and
it has become plain that much information about linguistic objects is predictable
on the basis of the properties of the words they contain. Theoretical frameworks
such as Lexical-Functional Grammar, Head-Driven Phrase Structure Grammar and
Categorial Grammar have emphasized the contribution of lexical specifications to
grammatical constraints and representations.
All work within unification-based frameworks (cf. [Shieber 86]) indeed stresses the
lexicon, and the popularity of unification in computational linguistics means that
NLP research has increasingly focussed on lexical problems. In one sense, it is not
surprising that problems of lexica should come to the forefront in NLP. After all, any
attempt to build anything more than a toy NLP system soon comes up against the
need to have a substantial dictionary and the enormous resources which have to be
devoted to this. This is a problem which is not faced in theoretical linguistics as
long as the lexicon is seen as a sideline and only ever studied by giving a handful of
examples of skeletal lexical entries.
This problem of dictionary creation is of special relevance to MT, where the earliest
work (direct systems) saw translation as an essentially lexical matter, with dictionar
ies performing the bulk of the work (cf. [Hutchins 86], pp. 40-41). As MT systems
became computationally and linguistically more sophisticated, grammatical and se-
1 EUROTRA is a research and development programme for machine translation sponsored and
supervised by the European Commission. For information on its underlying formal and linguistic
specifications, see [CDKM 91a] and [CDKM 91b].
2We stress that, apart from minor changes and the removal of various anachronistic remarks, the
papers have not been revised, and so may not represent the current views of their authors. 8 Studies in MT and NLP, Volume 8
mantic processing played a larger and larger role, but of course the need for large
dictionaries (monolingual and bilingual) remained. One important priority, though
not one reflected in the current volume, has been that of providing user-friendly tools
for dictionary update and expansion (cf. [White 87], pp. 241ff.).
Let us now sketch a number of themes in lexically-based research which have emerged
from the above concerns and which figure in the papers in this volume. The first is
reusability, the idea that lexical information prepared for one NLP system should
be usable in other systems and for other purposes. The advantages of this hardly
need stating: given the resources put into lexicon creation, it makes little sense to
start virtually from scratch each time. The information contained in lexical entries
is likely to be very similar in different systems (with the exception of domain- or
sublanguage-specific information), although it will probably be formalized very dif
ferently. The challenge, then is to design a 'polytheoretic' lexicon which can be
exploited or adapted by different theories or formalisms. A number of collaborative
research projects have pursued this theme in a very active manner, including the
EUROTRA-7 research [HM 91]. Reusability of grammatical, as opposed to lexical,
resources is also important (see [MS 94]).
A particular aspect of reusability concerns the use of already-existing dictionaries not
prepared with NLP in mind, viz. machine-readable versions of printed dictionaries.
It goes without saying that such dictionaries contain a wealth of information, though
not always in a sufficiently explicit form. Again, making use of such resources can
potentially save an enormous amount of time in NLP development. There is now
a great deal of work done on enabling such resources to be incorporated into NLP
systems (e.g. [BCC 87]; [BB 89]; [BPC 93]).
One special kind of lexicon with great relevance to MT is the terminological data
bank or term bank. A standard and up-to-date source of technical terminology,
whether monolingual or bilingual, has applications for many kinds of users. Indeed,
[Sager 90], pp. 134-5, emphasizes the multi-purpose nature of a term bank, and notes
that providing a dictionary for NLP use (including MT) is one important purpose.
A single terminological database for each language community is highly desirable,
for reasons of consistency, not just on grounds of cost. Issues of reusability, and of
resources being tapped by both human and machine users, are clearly crucial in this
context.
Along with linguistic interest in the lexicon there has been a resurgence of interest in
morphology, with a variety of views having been articulated on the morphology-syntax
relation. Compounding, for example, is now far better understood than previously,
though many problems remain. While some compounds can be entered in dictionar
ies, it is plain that this is not possible for all compounds, as there are in principle an
infinite number of them. Yet even newly-coined compounds are not truly composi
tional: the meanings of the parts play a role in the meaning of the whole but do not
determine it entirely (see [Lakoff 87], pp. 147-8). This raises enormous problems for
computational analysis and translation of compounds.

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Publié par	EUROPEAN-COMMISSION
Nombre de lectures	16
Langue	English
Poids de l'ouvrage	2 Mo

Lexical issues in machine translation

Information processing

Linguistics

Machine translation

Programming language

Terminology

YouScribe

Le catalogue

Le service

Les conditions