La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

Modelling the effects of speech rate variation for automatic speech recognition [Elektronische Ressource] / vorgelegt von Britta Wrede

132 pages
Publié par :
Ajouté le : 01 janvier 2002
Lecture(s) : 11
Signaler un abus

Modelling the Effects
of Speech Rate Variation
for Automatic Speech Recognition
Der Technischen Fakultat¨ der
Universitat¨ Bielefeld
zur Erlangung des Grades einer
Doktor-Ingenieurin
vorgelegt von
Britta Wrede
Bielefeld - Juni 2002Britta Wrede, M.A.
AG Angewandte Informatik
Technische Fakultat¨
Universitat¨ Bielefeld
email: bwrede@techfak.uni-bielefeld.de
Abdruck der genehmigten Dissertation zur Erlangung des
akademischen Grades Doktor-Ingenieurin (Dr.-Ing.). Der
Technischen Fakultat¨ am 5.6.2002 vorgelegt von Britta
Wrede.
Gutachter:
Dr. Gernot A. Fink
Prof. William Barry
Prufungsausschuss:¨
Prof. Ipke Wachsmuth
Dr. habil. Gernot A. Fink
Prof. William Barry
Prof. Gerhard Sagerer
Dr. Katharina Rohlfing
Gedruckt auf alterungsbestandigem¨ Papier ISO 9706Acknowledgments
This work would not have been possible without the help and support of many people.
While many PhD students are left alone with their thesis I feel extremely lucky in having had
not only a very competent but also responsible supervisor with Dr. habil. Gernot Fink who
was constantly available and more than willing to discuss problems, developments or new
ideas when they arrised. Without his fundamental knowledge of the intricacies of ESMER-
ALDA many solutions realised within this thesis would not have been possible. Also, I am
very grateful to Prof. William Barry who agreed to review this thesis in a very tight schedule
and whose comments were much appreciated.
Many thanks are due to Dr. Jacques Koreman for reading and re-reading parts of this the-
sis. His sharp eye spotted phonetic inconsistencies and hazardous syntactical constructions
of English. I am more than grateful for his valuable comments and detailed questions which
were a great help for giving this work its internal structure.
I would also like to thank the Applied Computer Science Group at Bielefeld which has
always been a fun place to work. Special thanks go to (”Script God”) Christoph Schillo who
not only proved to be an infinite source of awk- and shell-script knowledge and helpful little
tools but who was also a great office mate and friend. Also, many thanks go to Christian
Bauckhage for making the office such a plaesant and inspiring place to work - not only
during the usual office hours.
I deeply appreciated the interdisciplinary atmosphere in the graduate school ”Task-
oriented communication”. Not only did I receive much help with the planning of my thesis
but I also learned how to discuss my research in an interdisciplinary framework.
Last but not least I would like to thank Stephan for being such a great coach and for his
affection and emotional support whenever I needed it.Contents
1. Introduction 1
1.1. Aim of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Introduction to Phonetics and Speech Recognition 5
2.1. Phonetic Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2. Articulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3. Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4. Sources of Variation . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Automatic Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1. Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2. Acoustic Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3. Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.5. Evaluation of the System . . . . . . . . . . . . . . . . . . . . . . . 23
2.3. Comparison of the Acoustic Features . . . . . . . . . . . . . . . . . . . . . 23
3. Influence of Speech Rate on Acoustic-Phonetic Properties of Speech 27
3.1. Durational Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2. Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1. Causes of Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2. Centralisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.3. Effects on Dynamic Features . . . . . . . . . . . . . . . . . . . . . 37
iContents
3.2.4. Consonant Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3. Perceptual Effects of Speaking Rate . . . . . . . . . . . . . . . . . . . . . 42
3.3.1. Durational Normalisation . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2. Spectral . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4. Speech Rate Modelling in Automatic Speech Recognition 47
4.1. Speech-rate measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2. Compensation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1. Model Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2. Feature . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5. Implications for Effective Speech Rate Modelling 59
5.1. Acoustic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2. Speech Recognition Experiments . . . . . . . . . . . . . . . . . . . . . . . 61
6. Acoustic Analysis 63
6.1. Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3.1. Formants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3.2. Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7. Rate Dependent Models 77
7.1. Rate and Reduction Measures . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2. Experiments on the SLACC Corpus . . . . . . . . . . . . . . . . . . . . . 80
7.2.1. Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2.2. Baseline System . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2.3. Basis of the Models . . . . . . . . . . . . . . . . . . . . . . . . . . 82
iiContents
7.2.4. Rate- and Reduction Measures . . . . . . . . . . . . . . . . . . . . 83
7.2.5. Modelling Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2.6. Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.7. Data-driven Training Selection . . . . . . . . . . . . . . . . . . . . 95
7.2.8. Comparison to Speaker-Adaptation . . . . . . . . . . . . . . . . . 97
7.3. Experiments on the Verbmobil Corpus . . . . . . . . . . . . . . . . . . . . 99
7.3.1. Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.2. Baseline System . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.3. Adaptation to Duration . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3.4. Data-driven Training Selection . . . . . . . . . . . . . . . . . . . . 102
7.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8. Summary 107
Bibliography 110
A. Subset of German SAMPA inventory 119
Index 122
iiiContents
iv

Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin