Robust appearance based sign language recognition [Elektronische Ressource] / vorgelegt von Morteza Zahedi

rheinisch-westfalischen_technischen_hochschule_-rwth-_aachen

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

131 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	rheinisch-westfalischen_technischen_hochschule_-rwth-_aachen
Publié le	01 janvier 2007
Nombre de lectures	15
Langue	English
Poids de l'ouvrage	5 Mo

Extrait

Robust Appearance-based Sign Language Recognition
Von der Fakult¨at fur¨ Mathematik, Informatik
und Naturwissenschaften
der Rheinisch-Westf¨alischen Technischen Hochschule Aachen
zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften genehmigte Dissertation
vorgelegt von
Morteza Zahedi, M.Sc.
aus
Gorgan, Iran
Berichter: Universit¨atsprofessor Dr.-Ing. Hermann Neyersit¨atsprofessor Dr.-Ing. habil. Gerhard Rigoll
Tag der mundl¨ ichen Prufung¨ : 21.09.2007
Diese Dissertation ist auf den Internetseiten der Hochschulbib-
liothek online verfugbar¨ .To my familyAbstract
Inthiswork,weintroducearobustappearance-basedsignlanguagerecognitionsystemwhich
is derived from a large vocabulary speech recognition system. The system employs a large
variety of methods known from automatic speech recognition research for the modeling of
temporal and language speciﬁc issues. The feature extraction part of the system is based
on recent developments in image processing which model diﬀerent aspects of the signs and
accounts for visual variabilities in appearance. Diﬀerent issues of appearance-based sign lan-
guage recognition such as datasets, appearance-based features, geometric features, training,
and recognition parts are investigated and analyzed.
We discuss the state of the art in sign language and gesture recognition. In contrast
to the proposed system, most of the existing approaches use special data acquisition tools
to collect the data of the signings. The systems which use this kind of data capturing tools
are not useful in practical environments. Furthermore, the datasets created within their own
group are not publicly available which makes it diﬃcult to compare the results. To overcome
these shortcomings and the problems of the existing approaches, our system is built to use
videodataonlyandevaluatedonpubliclyavailabledata.First,toovercomethescarcenessof
publiclyavailabledataandtoremovethedependencyonimpracticaldatacapturingdevices,
we use normal video ﬁles publicly available and create appropriate transcriptions of these
ﬁles. Then, appearance-based features are extracted directly from the videos. To cope with
the visual variability of the signs occurring in the image frames, pronunciation clustering,
invariant distances, and diﬀerent reduction methods are investigated.
Furthermore, geometric features capturing the conﬁguration of the signers’ hand are
investigated improving the accuracy of the recognition system. The geometric features rep-
resent the position, the orientation and the conﬁguration of the signers’ dominant hand
which plays a major role to convey the meaning of the signs.
Finally, it is described how to employ the introduced methods and how to combine the
features to construct a robust sign language recognition system.
Zusammenfassung
In dieser Arbeit wird ein robustes, erscheinungsbasiertes Geb¨ardenspracherkennungssystem
aufbauend auf einem Spracherkennungssystem fur¨ großes Vokabular vorgestellt. In diesem
System werden viele Methoden aus der automatischen Spracherkennung zur Modellierung
zeitlicher und sprachspeziﬁscher Eigenheiten eingesetzt. Die Merkmalsextraktion in diesem
System basiert auf neuen Entwicklungen der Bildverarbeitung, um unterschiedliche Aspekte
derGeb¨ardenunddervisuellenUnterschiedeinderErscheinungzumodellieren.Verschiedene
Sachverhalte der erscheinungsbasierten Geb¨ardenspracherkennung, wie z.B. Datensammlun-
gen, erscheinungsbasierte Merkmale, geometrische Merkmale, Training und Erkennung wer-
den untersucht und analysiert.
Außerdem wird der Stand der Forschung in der Geb¨ardensprach- und Gestenerken-
nung dargelegt. Im Gegensatz zum hier vorgestellten System bauen die meisten existieren-
den Ans¨atze auf spezielle Datenaufnahmetechniken, um die Gestendaten im Computer zu
speichern. Systeme, die auf spezielle Datenaufnahmeger¨ate angewiesen sind, sind jedoch in
praktischen Anwendungen oftmals nicht einsetzbar. Die Datensammlungen, die in den Sys-
temen verwendet werden, sind oftmals von den publizierenden Gruppen erstellt worden undsind nicht ¨oﬀentlich verfugba¨ r, was es schwierig bzw. unm¨oglich macht, die Ergebnisse zu
vergleichen. Um diese Deﬁzite zu bewa¨ltigen, werden in unserem System nur Videodaten
verwendentet, und die Evaluation ﬁndet ausschließlich auf ¨oﬀentlich verfugba¨ ren Daten-
sammlungen statt.
Um den Mangel an frei verfugba¨ ren Daten zu reduzieren, und um auf unpraktische
Datenaufnahmeger¨ate verzichten zu k¨onnen, benutzen wir zun¨achst Videos aus ¨oﬀentlich
verfug¨ baren Quellen. Anschließend annotieren wir diese, um sie in unserem System zu
trainieren und zu testen.
UmdiegroßenvisuellenVariabilit¨atenderGeb¨ardenindenVideobildernzumodellieren
verwenden wir Aussprachevarianten, invariante Distanzfunktionen und unterschiedliche
Merkmalsextraktions- und Reduktionsverfahren.
Außerdem werden geometrische Merkmale, die die Handkonﬁguration des Geb¨ardenden
repr¨asentieren, benutzt, um die Genauigkeit des Erkennungssystems zu verbessern. Diese
modellieren die Handposition, -orientierung und -konﬁguration der dominanten Hand des
Geb¨ardenden, die eine entscheidende Rolle fur¨ die Bedeutung einer Geb¨arde spielen.
Schlussendlich wird beschrieben, wie die vorgestellten Methoden benutzt und zu einem
robustenGeb¨ardenspracherkennungssystemmiteinerhohenErkennungsratekombiniertwer-
den konnen.¨Acknowledgments
Laudation to the God of majesty and glory! Obedience to him is
a cause of approach and gratitude in increase of beneﬁts. Every
inhalation of the breath prolongs life and every expiration of it
gladdens our nature; wherefore every breath confers two beneﬁts
and for every beneﬁt gratitude is due.
Whose hand and tongue is capable
To fulﬁl the obligations of thanks to him?
– Sa’di (1184–1283) in Golestan
I would like to thank the people who have supported and accompanied me during my
four-year stay in Germany to prepare this dissertation. I appreciate all suggestions, guiding,
contributions, and even a “Hallo” or “Tschuss¨ ” from the people who joined and encouraged
me to reach this goal.
First of all, I would like to thank Prof. Dr.-Ing. Hermann Ney, head of the Lehrstuhl
fur¨ Informatik 6 of RWTH Aachen University, who supervised this thesis. His guiding and
ideas not only opened a new window into new aspects of statistical pattern recognition for
me, but also he added some constraints and restrictions into the method of thinking which
have been very beneﬁcial and accelerating my research process. Learning how to manage a
research group to perform projects was one of the most important lessons I have made. His
insist on introducing the state of the art at the beginning of the papers and presentations
taught me how to make a scientiﬁc report or paper more understandable. He also supported
me to attend at diﬀerent conferences and workshops which have been very important and
helpful to me.
IamverythankfultoProf.Dr.-Ing.GerhardRigollwhoacceptedtobeco-refereeofthis
thesis. His publications in the ﬁeld of gesture recognition were very useful and beneﬁcial.
IamalsoverythankfulformycolleaguesoftheimageprocessinggroupattheLehrstuhl,
especially Daniel Keysers and Thomas Deselaers as heads of the group and Philippe Dreuw
as a colleague working on the same ﬁeld of research. They have helped me a lot with their
suggestions, ideas and contributions. Also participation at the workshops and conferences
with them have been one of the most enjoyable times of my work.
My special thanks go to Jan Bungeroth and Shahram Khadivi, who have helped me
to analyze the German environment and to handle life in Germany and who have spent
their time for discussions about a variety of aspects and problems such as politics, shopping,
travelling, etc.
The ones working at my oﬃce during the time, Jan Bungeroth, Daniel Keysers and
Daniel Stein have made working time enjoyable. I thank my oﬃce-mates very much for a
goodadvicewhenIwasworryingforsomethingandforsayingashortsentenceorforinviting
me to drink a cup of tea or coﬀee.
I am very grateful to all my colleagues at the Lehrstuhl I6, working in the translation
and speech recognition groups. Their talks, comments and discussions during my research
especiallyatPhDseminarswereveryhelpfulandbeneﬁcial.AlthoughIcouldnotparticipate
in all I6 aﬀairs like “DVD Abend”, Day tours, coﬀee breaks, parties and so on, participation
at few of them was very enjoyable.
ManythanksgotomyPersianfriendsandtheirfamiliesinAachenduetotheirkindness
to spend their time with me and my family to help each other to handle the life, which goes
on very slowly. I appreciate their contributions and eﬀorts to make enjoyable plans like tripsand parties at the weekends.
Finally, I am very thankful to my family, Farnaz and Tabassom due to their patience
and bearing the diﬃculties of living abroad. Without their support and encouragement my
eﬀorts concluding in this dissertation would not have been possible.
This dissertation was written during my time as a researcher with the Lehrstuhl fur¨
Informatik 6 of RWTH Aachen University in Aachen, Germany. This work was partially
funded by Iran Scholarship Oﬃce at MSRT (Ministry of Science, Research an