Bayes risk decoding and its application to system combination [Elektronische Ressource] / Björn Hoffmeister

rheinisch-westfalischen_technischen_hochschule_-rwth-_aachen - Björn Hoffmeister

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

188 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	rheinisch-westfalischen_technischen_hochschule_-rwth-_aachen
Publié le	01 janvier 2011
Nombre de lectures	9
Langue	English
Poids de l'ouvrage	1 Mo

Extrait

Bayes Risk Decoding
and its Application to System Combination
Von der Fakultat fur Mathematik, Informatik und Naturwissenschaften
der RWTH Aachen University zur Erlangung des akademischen Grades
eines Doktors der Naturwissenschaften genehmigte Dissertation
vorgelegt von
Diplom-Informatiker Bjorn Ho meister
aus Aachen
Berichter:
Professor Dr.{Ing. Hermann Ney
Privatdozent Dr. Jean{Luc Gauvain
Tag der mundlichen Prufung: 18. Juli 2011
Diese Dissertation ist auf den Internetseiten der Hochschulbibliothek online verfugbar.Abstract
Speech recognition is the task of converting an acoustic signal, which contains speech, to written text.
The error of a speech recognition system is measured in the number of words in which the recognized and
the spoken text di er. This work investigates and develops decoding and system combination approaches
within the Bayes risk decoding framework with the objective of reducing the number of word errors.
The investigated approaches are computationally too expensive to be applied in the speech decoder.
Instead, the result of a rst recognition run is used which narrows the number of hypotheses and provides
the result in a compact form, the word lattice. In the single system decoding task a single word lattice
is given and in the lattice-based system combination task a word lattice is provided by each system.
In both cases the goal is to minimize the number of word errors in the ultimate hypothesis. In large
vocabulary continuous speech recognition (LVCSR) tasks the number of word errors is computed as the
Levenshtein distance between recognized and spoken text. The Bayes risk decoding framework yields the
hypothesis with the least expected number of errors w.r.t. a speci ed loss function and given the true
sentence posterior probabilities. However, neither the true probabilities are known nor is the computation
of the Bayes risk hypothesis with the Levenshtein distance as loss function computationally feasible for
a word lattice. Consequently, in lattice-based Bayes risk decoding and system combination two problems
have to be addressed: rst, how to compute an estimate for the sentence posterior probabilities given one
or several word lattices; second, how to approximate the Levenshtein distance such that the computation
of the Bayes risk hypothesis becomes computationally feasible.
Based on the separation of the posterior probability computation and the loss function in the Bayes
risk decoding rule a framework will be developed, which covers the common approaches to lattice-based
system combination, like ROVER, CNC, and DMC. Furthermore, it will be shown that the common
approximations of the Levenshtein distance used in LVCSR tasks can be classi ed into two categories for
which e cient Bayes risk decoder exist. The existing approximates will be investigated and compared.
New loss functions will be developed which overcome drawbacks of the existing approximations to the
Levenshtein distance, like the frequently observed deletion bias.
A data structure of particular interest is the confusion network (CN). In previous work it was shown
that a CN has a simple decoding rule in the Bayes risk framework. In this work new algorithms for
deriving a CN from a word lattice will be developed and compared to existing methods. Furthermore, the
CN will be the base for several investigations aiming at improving the posterior probability estimates and
the approximation of the Levenshtein distance. The methods looked into include classi er-based system
combination and the usage of a windowed Levenshtein distance as loss function for the Bayes risk decoder.
A further topic of research is the log-linear model combination for which the enhancement with model-
and word-dependent scaling factors will be investigated.
The methods are tested on the Chinese speech recognition systems used by RWTH Aachen in the GALE
project and on the lattices provided within the English track of the 2007 TC-Star EPPS evaluation. The
best performing system combination methods investigated in this work improve the error rates by up to
10% relative for intra-site combination experiments and by more than 20% relative for cross-site combi-
nations compared to the best single system. The newly developed methods show a slight improvement
over the existing approaches to lattice decoding and lattice-based system combination.
iiiZusammenfassung
Die automatische Spracherkennung befasst sich mit der Aufgabe gesprochene Sprache in geschriebenen
Text umzuwandeln. Der Fehler eines Spracherkennungsystems wird in der Anzahl der Worter gemessen, in
denen der gesprochene vom erkannten Text abweicht. Thema dieser Arbeit ist die Verwendung des Bayes
Risk Frameworks mit dem Ziel den Fehler eines einzelnen Systems oder einer Kombination von mehreren
Systemen zu minimieren.
Bedingt durch die Komplexitat der Methoden werden alle Experimente und Untersuchungen in dieser
Arbeit auf Wortgraphen durchgefuhrt. Ein Wortgraph ist die kompakte Darstellung eines eingeschrankten
Hypothesenraums, der von einem vorgeschalteten Erkennungslauf erzeugt wird. Im Falle der Systemkom-
bination wird pro System ein Wortgraph bereitgestellt. Das Ziel ist es, aus den Wortgraphen eine nale
Hypothese zu generieren, die einen geringeren Wortfehler aufweist als jedes der einzelnen System. In
der kontinuierlichen Spracherkennung mit gro em Wortschatz wird der Wortfehler als der Levenshteinab-
stand zwischen gesprochener und erkannter Wortfolge de niert. Falls die wahren Satzwahrscheinlichkeiten
bekannt sind, liefert das Bayes Risk Framework die Wortfolge mit dem geringsten zu erwarteten Fehler. In
der Praxis sind allerdings weder die wahren Wahrscheinlichkeiten bekannt, noch ist die Komplexitat der
Berechnung der Bayes Risk Hypothese auf einem Wortgraphen handhabbar, wenn der Levenshteinabstand
als Kostenfunktion verwendet wird. Somit ergeben sich die beiden folgenden Aufgabenstellungen: Erstens,
wie lassen sich aus den systemabhangigen Wortgraphen Wahrscheinlichkeiten schatzen. Und zweitens, wie
lasst sich der Levenshteinabstand so abschatzen, da die Komplexit at der Berechnung der Bayes Risk
Hypothese handhabbar wird.
In dieser Arbeit wird, basierend auf der Trennung der Schatzung der Wahrscheinlichkeiten und der
Kostenfunktion in der Bayes Risk Berechnung, ein allgemeines Framework fur die wortgraphgestutzte Sys-
temkombination entwickelt. Das Framework deckt die in der Praxis gangigen Methoden ab, u.a. ROVER,
CNC und DMC. Weiterhin wird gezeigt, da sich die, in der Sprachererkennung g angigen, Abschatzungen
des Levenshteinabstands in zwei Klassen einteilen lassen, fur die sich die Bayes Risk Hypothese e zient
berechnen lasst. Die bekannten Abschatzungen werden untersucht und verglichen. Neue Verfahren wer-
den entwickelt, die die Nachteile der bestehenden Abschatzungen ausgleichen, insbesondere den hau g zu
beobachtenden hohen Anteil an Auslosc hungen.
Eine Datenstruktur von besonderem Interesse ist das Confusion Network (CN). In fruheren Arbeiten
wurde gezeigt, da sich die Bayes Risk Hypothese eines CNs auf triviale Weise berechnen l asst. In die-
ser Arbeit werden neue Verfahren zur Umwandlung eines Wortgraphen in ein CN vorgestellt und mit
bestehenden Verfahren verglichen. Weiterhin bildet das CN die Grundlage fur mehrere Ansatze zur ver-
besserten Schatzung der Wahrscheinlichkeiten und zur genaueren Abschatzung des Levenshteinabstands.
Die untersuchten Ansatze beinhalten die klassi katorbasierte Systemkombination und den Einsatz eines
gefensterten Levenshteinabstands als Kostenfunktion in der Berechnung der Bayes Risk Hypothese.
Ein weiteres Thema, das in dieser Arbeit untersucht wird, ist die log-lineare Modellkombination, fur
die modell- und wortabhangige Skalierungsfaktoren eingefuhrt werden.
Experimente werden mit den chinesischen Spracherkennern durchgefuhrt, die an der RWTH Aachen
im Laufe des GALE Projekts entwickelt wurden, sowie mit den Wortgraphen, die im Zuge der 2007 TC-
Star EPPS Evaluation bereitgestellt wurden. Die besten Methoden zur Systemkombination, die in dieser
Arbeit untersucht werden, zeigen eine relative Verbesserung in der Wortfehlerrate um bis zu 10% fur die
hausinterne Wortgraphkombination und mehr als 20% fur die Kombination von Wortgraphen mehrerer
Projektpartner. Dabei bezieht sich die relative Verbesserung auf die Fehlerrate des besten Einzelsystems.
Im Vergleich zu den bestehenden Methoden zur wortgraphbasierten Systemkombination erzielen die neu-
entwickelten Verfahren leichte Verbesserungen.
vAcknowledgement
First of all I would like to thank my doctoral adviser, Prof. Dr.-Ing. Hermann Ney, head of the Chair of
Human Language Technology and Pattern Recognition, Lehrstuhl fur Informatik 6, at the RWTH Aachen
University, for his support and his interest. He introduced me to speech recognition in 2004 when I started
my studies as a PhD student and he has since then given me the opportunity and the freedom to pursue
my ideas.
I would also like to thank Dr. Jean-Luc Gauvain for agreeing to review this thesis and for the interest
in this work.
I am very grateful to Dr. Ralf Schluter for his support in the eld of Bayes risk decision theory and its
application to speech recognition. His supportive coaching helped me to make my decisions and to de ne
my long-term research goals. Special thanks go to Stephan Kanthak who mentored me in my rst year
and introduced me to the concepts of transducers and their appli