Zhengjun Yue
Raw Source and Filter Modelling for Dysarthric Speech Recognition
Yue, Zhengjun; Loweimi, Erfan; Cvetkovic, Zoran
Authors
Erfan Loweimi
Zoran Cvetkovic
Abstract
Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between the typical and dysarthric speech complicates transfer learning. In this paper, we build acoustic models using the raw magnitude spectra of the source and filter components. The proposed multi-stream model consists of convolutional and recurrent layers. It allows for fusing the vocal tract and excitation components at different levels of abstraction and after per-stream pre-processing. We show that such a multi-stream processing leverages these two information streams and helps s model towards normalising the speaker attributes and speaking style. This potentially leads to better handling of the dysarthric speech with a large inter-speaker and intra-speaker variability. We compare the proposed system with various features, study the training dynamics, explore usefulness of the data augmentation and provide interpretation for the learned convolutional filters. On the widely used TORGO dysarthric speech corpus, the proposed approach results in up to 1.7% absolute WER reduction for dysarthric speech compared with the MFCC base-line. Our best model reaches up to 40.6% and 11.8% WER for dysarthric and typical speech, respectively.
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Start Date | May 23, 2022 |
End Date | May 27, 2022 |
Online Publication Date | Apr 27, 2022 |
Publication Date | 2022 |
Deposit Date | Apr 3, 2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Series ISSN | 2379-190X |
Book Title | ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
DOI | https://doi.org/10.1109/icassp43922.2022.9746553 |
Keywords | Dysarthric speech recognition, source-filter separation and fusion, multi-stream acoustic modelling |
Public URL | http://researchrepository.napier.ac.uk/Output/3585825 |
You might also like
Phonetic Error Analysis Beyond Phone Error Rate
(2023)
Journal Article
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
(2023)
Journal Article
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition
(2022)
Journal Article
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra
(2023)
Presentation / Conference Contribution
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
(2022)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search