Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition

Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition
Author :
Publisher :
Total Pages : 127
Release :
ISBN-10 : OCLC:1106539212
ISBN-13 :
Rating : 4/5 (12 Downloads)

Book Synopsis Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition by : Jinxi Guo

Download or read book Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition written by Jinxi Guo and published by . This book was released on 2019 with total page 127 pages. Available in PDF, EPUB and Kindle. Book excerpt: Deep learning and neural network research has grown significantly in the fields of automatic speech recognition (ASR) and speaker recognition. Compared to traditional methods, deep learning-based approaches are more powerful in learning representation from data and building complex models. In this dissertation, we focus on representation learning and modeling using neural network-based approaches for speech and speaker recognition. In the first part of the dissertation, we present two novel neural network-based methods to learn speaker-specific and phoneme-invariant features for short-utterance speaker verification. We first propose to learn a spectral feature mapping from each speech signal to the corresponding subglottal acoustic signal which has less phoneme variation, using deep neural networks (DNNs). The estimated subglottal features show better speaker-separation ability and provide complementary information when combined with traditional speech features on speaker verification tasks. Additional, we propose another DNN-based mapping model, which maps the speaker representation extracted from short utterances to the speaker representation extracted from long utterances of the same speaker. Two non-linear regression models using an autoencoder are proposed to learn this mapping, and they both improve speaker verification performance significantly. In the second part of the dissertation, we design several new neural network models which take raw speech features (either complex Discrete Fourier Transform (DFT) features or raw waveforms) as input, and perform the feature extraction and phone classification jointly. We first propose a unified deep Highway (HW) network with a time-delayed bottleneck layer (TDB), in the middle, for feature extraction. The TDB-HW networks with complex DFT features as input provide significantly lower error rates compared with hand-designed spectrum features on large-scale keyword spotting tasks. Next, we present a 1-D Convolutional Neural Network (CNN) model, which takes raw waveforms as input and uses convolutional layers to do hierarchical feature extraction. The proposed 1-D CNN model outperforms standard systems with hand-designed features. In order to further reduce the redundancy of the 1-D CNN model, we propose a filter sampling and combination (FSC) technique, which can reduce the model size by 70% and still improve the performance on ASR tasks. In the third part of dissertation, we propose two novel neural-network models for sequence modeling. We first propose an attention mechanism for acoustic sequence modeling. The attention mechanism can automatically predict the importance of each time step and select the most important information from sequences. Secondly, we present a sequence-to-sequence based spelling correction model for end-to-end ASR. The proposed correction model can effectively correct errors made by the ASR systems.


Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition Related Books

Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition
Language: en
Pages: 127
Authors: Jinxi Guo
Categories:
Type: BOOK - Published: 2019 - Publisher:

DOWNLOAD EBOOK

Deep learning and neural network research has grown significantly in the fields of automatic speech recognition (ASR) and speaker recognition. Compared to tradi
Automatic Speech Recognition
Language: en
Pages: 329
Authors: Dong Yu
Categories: Technology & Engineering
Type: BOOK - Published: 2014-11-11 - Publisher: Springer

DOWNLOAD EBOOK

This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models includin
Machine Learning for Speaker Recognition
Language: en
Pages: 329
Authors: Man-Wai Mak
Categories: Technology & Engineering
Type: BOOK - Published: 2020-11-19 - Publisher: Cambridge University Press

DOWNLOAD EBOOK

This book will help readers understand fundamental and advanced statistical models and deep learning models for robust speaker recognition and domain adaptation
Speech Processing, Recognition and Artificial Neural Networks
Language: en
Pages: 352
Authors: Gerard Chollet
Categories: Technology & Engineering
Type: BOOK - Published: 2012-12-06 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

Speech Processing, Recognition and Artificial Neural Networks contains papers from leading researchers and selected students, discussing the experiments, theori
Speech Recognition
Language: en
Pages: 149
Authors: Fouad Sabry
Categories: Computers
Type: BOOK - Published: 2023-07-05 - Publisher: One Billion Knowledgeable

DOWNLOAD EBOOK

What Is Speech Recognition Computer science and computational linguistics include a subfield called speech recognition that focuses on the development of approa