Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic, Zoran

doi:10.21437/interspeech.2022-163

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic, Zoran

Authors

Zhengjun Yue

Erfan Loweimi

Heidi Christensen

Jon Barker

Zoran Cvetkovic

Abstract

Raw waveform acoustic modelling has recently received increasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentially include all task-relevant information. In the context of automatic dysarthric speech recognition (ADSR), raw waveform acoustic modelling is under-explored owing to data scarcity. Parametric convolutional neural networks (CNNs) can compensate for this problem due to having notably fewer parameters and requiring less training data in comparison with conventional non-parametric CNNs. In this paper, we explore the usefulness of raw waveform acoustic modelling using various parametric CNNs for ADSR. We investigate the properties of the learned filters and monitor the training dynamics of various models. Furthermore, we study the effectiveness of data augmentation and multi-stream acoustic modelling through combining the non-parametric and parametric CNNs fed by hand-crafted and raw waveform features. Experimental results on the TORGO dysarthric database show that the parametric CNNs significantly outperform the non-parametric CNNs, reaching up to 36.2% and 12.6% WERs (up to 3.4% and 1.1% absolute error reduction) for dysarthric and typical speech, respectively. Multi-stream acoustic modelling further improves the performance resulting in up to 33.2% and 10.3% WERs for dysarthric and typical speech, respectively.

Citation

Yue, Z., Loweimi, E., Christensen, H., Barker, J., & Cvetkovic, Z. (2022, September). Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs. Paper presented at Interspeech 2022, Incheon, Korea

Presentation Conference Type	Conference Paper (unpublished)
Conference Name	Interspeech 2022
Start Date	Sep 18, 2022
End Date	Sep 22, 2022
Deposit Date	Apr 3, 2024
DOI	https://doi.org/10.21437/interspeech.2022-163
Public URL	http://researchrepository.napier.ac.uk/Output/3585813

Downloadable Citations

HTML

BIB

RTF