Skip to main content

Research Repository

Advanced Search

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic, Zoran

Authors

Zhengjun Yue

Erfan Loweimi

Heidi Christensen

Jon Barker

Zoran Cvetkovic



Abstract

Raw waveform acoustic modelling has recently received increasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentially include all task-relevant information. In the context of automatic dysarthric speech recognition (ADSR), raw waveform acoustic modelling is under-explored owing to data scarcity. Parametric convolutional neural networks (CNNs) can compensate for this problem due to having notably fewer parameters and requiring less training data in comparison with conventional non-parametric CNNs. In this paper, we explore the usefulness of raw waveform acoustic modelling using various parametric CNNs for ADSR. We investigate the properties of the learned filters and monitor the training dynamics of various models. Furthermore, we study the effectiveness of data augmentation and multi-stream acoustic modelling through combining the non-parametric and parametric CNNs fed by hand-crafted and raw waveform features. Experimental results on the TORGO dysarthric database show that the parametric CNNs significantly outperform the non-parametric CNNs, reaching up to 36.2% and 12.6% WERs (up to 3.4% and 1.1% absolute error reduction) for dysarthric and typical speech, respectively. Multi-stream acoustic modelling further improves the performance resulting in up to 33.2% and 10.3% WERs for dysarthric and typical speech, respectively.

Citation

Yue, Z., Loweimi, E., Christensen, H., Barker, J., & Cvetkovic, Z. (2022, September). Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs. Paper presented at Interspeech 2022, Incheon, Korea

Presentation Conference Type Conference Paper (unpublished)
Conference Name Interspeech 2022
Start Date Sep 18, 2022
End Date Sep 22, 2022
Deposit Date Apr 3, 2024
DOI https://doi.org/10.21437/interspeech.2022-163
Public URL http://researchrepository.napier.ac.uk/Output/3585813


Downloadable Citations