Combining natural language processing systems to improve machine translation of speech [Elektronische Ressource] / vorgelegt von Evgeny Matusov

rheinisch-westfalischen_technischen_hochschule_-rwth-_aachen

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

197 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	rheinisch-westfalischen_technischen_hochschule_-rwth-_aachen
Publié le	01 janvier 2009
Nombre de lectures	7
Langue	English
Poids de l'ouvrage	1 Mo

Extrait

Combining Natural Language Processing Systems
to Improve Machine Translation
of Speech
Von der Fakultat fur Mathematik, Informatik¨ ¨
und Naturwissenschaften der
Rheinisch-Westfalischen Technischen Hochschule Aachen¨
zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften genehmigte Dissertation
vorgelegt von
Diplom-Informatiker
Evgeny Matusov
aus Moskau, Russland
Berichter: Universitatsprofessor¨ Dr.-Ing. Hermann Ney
Universit Dr. Jos´e B. Marino˜ Acebal¨
Tag der mundlic¨ hen Prufung:¨ 10. Dezember 2009
Diese Dissertation ist auf den Internetseiten der Hochschulbibliothek online verfugbar.¨To my father.Acknowledgments
First, I would like to express my gratitude to my advisor Professor Dr.-Ing. Hermann
Ney, head of the Chair of Computer Science 6 at the RWTH Aachen University. This
thesis would not have been possible without his support, advice, and valuable ideas and
suggestions.
IwouldalsoliketothankProfessorDr. Jos´eB.Marino˜ fromthePolytechnicUniversity
of Catalonia for agreeing to review this thesis and for his precious comments and interest
in this work.
I would very much like to thank my colleagues at the Chair of Computer Science 6 for
motivatingdiscussions,helpfulfeedback,andforassistingmeinproofreadingthisthesis. I
greatlyappreciatetheverygoodworkingatmospheretheyhelpedtocreate. Inparticular,
I would like to express my deep gratitude to my former and current colleagues Stephan
Kanthak, Richard Zens, Gregor Leusch, Bjorn¨ Hoﬀmeister, Arne Mauser, Nicola Ueﬃng,
andDavidVilar. Manyoftheresultsobtainedforthisthesiswouldnothavebeenpossible
without their prior work and the research that we discussed and/or conducted together.
I would also like to thank Dustin Hillard who has been working at the University of
Washington for the many fruitful discussions and joint experiments related to automatic
sentence segmentation and punctuation prediction. I also thank my colleague Jonas L¨oof¨
andDr.MarcelloFedericofromFondazioneBrunoKessler, Italywhohadkindlyprovided
speech recognition lattices for some of the experiments reported in this thesis.
I am very grateful to have the wonderful love of my wife Irina and my dear son Ilja.
This thesis would not have been possible without Irina’s continuous encouragement and
patience. I also want to thank Ilja for reminding me that there are things more important
than this work. I would also like to thank my dear parents, my brother Vitaly, my aunt
Rita for being there for me when I need it. Finally, I would like to thank Edward Hunter,
a computer science teacher at a college in Arizona, for inspiring my unceasing interest in
computers when I was twelve.This thesis is based on work carried out during my time as a research scientist at
the Department for Computer Science at the RWTH Aachen University, Germany. The
work was partially funded by European Union under the integrated project TC-STAR –
Technology and Corpora for Speech to Speech Translation (IST-2002-FP6-506738), and
is partially based upon work in the GALE project supported by the Defense Advanced
Research Projects Agency (DARPA) under Contract No. HR0011-06-C-0023.Abstract
Machine translation of spoken language is a challenging task that involves several natural
language processing (NLP) software modules. Human speech in one natural language
has to be ﬁrst automatically transcribed by a speech recognition system. Next, the
transcription of the spoken utterance can be translated into another natural language
by a machine translation system. In addition, it may be necessary to automatically insert
sentence boundaries and punctuation marks.
In recent years, a tremendous progress in improving the quality of automatic speech
translation could be observed. In particular, statistical approaches to both speech
recognition and machine translation have proved to be eﬀective on a large number of
translation tasks with both small and large vocabularies. Nevertheless, many unsolved
problems remain. In particular, the systems involved in speech translation are often
developed and optimized independently of each other.
Thegoalofthisthesisistoimprovespeechtranslationqualitybyenhancingtheinterface
between various statistical NLP systems involved in the task of speech translation. The
whole pipeline is considered: automatic speech recognition (ASR); automatic sentence
segmentation and prediction of punctuation marks; machine translation (MT) using
several systems which take either single best or multiple ASR hypotheses as input and
employ diﬀerent translation models; combination of the output of diﬀerent MT systems.
The coupling between the various components is reached through combination of model
scores and/or hypotheses, development of new and modiﬁcations of existing algorithms
to handle ambiguous input or to meet the constraints of the downstream components,
as well as through optimization of model parameters with the aim of improving the ﬁnal
translation quality.
The main focus of the thesis is on a tighter coupling between speech recognition and
machine translation. To this end, two phrase-based MT systems based on two diﬀerent
statistical models are extended to process ambiguous ASR output in the form of word
lattices. A novel algorithm for lattice-based translation is proposed that allows for
exhaustive, but eﬃcient phrase-level reordering in the search. Experimental results show
that signiﬁcant improvements in translation quality can be obtained by avoiding hard
decisions in the ASR system and choosing the path in the lattice with the most likely
translationaccordingtothecombinationofrecognitionandtranslationmodelscores. The
conditions under which these improvements are to be expected are identiﬁed in numerous
experiments on several small and large vocabulary MT tasks.
Another important part of this work is combination of multiple MT systems. Diﬀerent
MT systems tend to make diﬀerent errors. To take advantage of this fact, a method for
computing a consensus translation from the outputs of several MT systems is proposed.
In this approach, a consensus translation is computed on the word level and includes
a novel statistical approach for aligning and reordering the translation hypotheses so
that a confusion network for weighted majority voting can be created. A consensus
translation is expected to contain words and phrases on which several systems agree and
which therefore have a high probability of being correct. In the application to speech
translation, the goal can be to combine MT systems which translate only the single bestASR output and those systems which can translate word lattices. The proposed system
combinationmethodresultedinhighlysigniﬁcantimprovementsintranslationqualityover
the best single system on a multitude of text and speech translation tasks. Many of these
improvements were obtained in oﬃcial and highly competitive evaluation campaigns, in
which the quality of the translations was evaluated using both automatic error measures
and human judgment.Zusammenfassung
¨Maschinelle Ubersetzung gesprochener Sprache ist eine anspruchsvolle Aufgabe, die
mehrere Softwaremodule aus dem Gebiet der Sprachverarbeitung einbezieht. Das
Gesprochene in einer naturlichen Sprache muss zuerst automatisch mit Hilfe eines¨
Spracherkennungssystems transkribiert werden. Danach kann die Transkription der
¨gesprochenen Außerung in eine andere nat urliche Sprache mit einem maschinellen¨
¨Ubersetzungssystem ub¨ ersetzt werden. Ferner kann es notwendig sein, die Satzgrenzen
und Interpunktionszeichen automatisch einzufugen.¨
IndenletztenJahrenkonnteeineenormeVerbesserungderQualitat¨ derautomatischen
Sprachubersetzung beobachtet werden. Besonders die statistischen Ansatze fur die¨ ¨ ¨
¨Spracherkennung und maschinelle Ubersetzung haben sich als eﬀektiv auf einer Vielzahl
¨von Ubersetzungsaufgaben mit kleinem und großem Vokabular erwiesen. Jedoch bleiben
¨noch viele Probleme ungelost. Insbesondere werden die Systeme, die in der Ubersetzung¨
gesprochener Sprache involviert sind, hauﬁg¨ unabhangig¨ voneinander entwickelt und
optimiert.
¨Das Ziel dieser Dissertation ist, die Qualitat¨ der Ubersetzung gesprochener Sprache zu
steigern,indemdieSchnittstellezwischendenverschiedenenSprachverarbeitungssystemen
verbessert wird, die an dieser Aufgabe beteiligt sind. Die komplette Kette der
Sprachubersetzung wird in Angriﬀ genommen: automatische Spracherkennung; automa-¨
¨tische Satzsegmentierung und Bestimmung der Satzzeichen; maschinelle Ubersetzung
unter Verwendung mehrerer Systeme, die entweder die beste automatisch erkannte
Wortfolge oder mehrere Spracherkennungshypothesen als Eingabe nehmen und
¨verschiedeneUbersetzungsmodelleeinsetzen;KombinationderAusgabederverschiedenen
¨Ubersetzungssysteme. Die Koppelung zwischen den verschiedenen Komponenten wird
durch Kombination von Modellbewertungen und/oder Hypothesen erreicht, sowie
durch Entwicklung neuer und Erweiterungen existierender Algorithmen mit dem Ziel,
mehrdeutigeEingabezuverarbeitenoderdieAnforderungendernachgeschaltetenModule
zu erfullen. Außerdem werden Modelparameter der Komponenten auf die Verbesserung¨
¨im Hinblick auf die Ubersetzungsqualitat¨ optimiert.
DerHa