Improving End-to-end Neural Network Models for Low-resource Automatic Speech Recognition
Author | : Jennifer Fox Drexler |
Publisher | : |
Total Pages | : 140 |
Release | : 2020 |
ISBN-10 | : OCLC:1227518442 |
ISBN-13 | : |
Rating | : 4/5 (42 Downloads) |
Download or read book Improving End-to-end Neural Network Models for Low-resource Automatic Speech Recognition written by Jennifer Fox Drexler and published by . This book was released on 2020 with total page 140 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this thesis, we explore the problem of training end-to-end neural network models for automatic speech recognition (ASR) when limited training data are available. End-to-end models are theoretically well-suited to low-resource languages because they do not rely on expert linguistic resources, but they are difficult to train without large amounts of transcribed speech. This amount of training data is prohibitively expensive to acquire in most of the world’s languages. We present several methods for improving end-to-end neural network-based ASR in low-resource scenarios. First, we explore two methods for creating a shared embedding space for speech and text. In doing so, we learn representations of speech that contain only linguistic content and not, for example, the speaker or noise characteristics in the speech signal. These linguistic-only representations allow the ASR model to generalize better to unseen speech by discouraging the model from learning spurious correlations between the text transcripts and extra-linguistic factors in speech. This shared embedding space also enables semi-supervised training of some parameters of the ASR model with additional text. Next, we experiment with two techniques for probabilistically segmenting text into subword units during training. We introduce the n-gram maximum likelihood loss, which allows the ASR model to learn an inventory of acoustically-inspired subword units as part of the training process. We show that this technique combines well with the embedding space alignment techniques in the previous section, leading to a 44% relative improvement in word error rate in the lowest resource condition tested.