Zhengjun Yue
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic, Zoran
Authors
Erfan Loweimi
Heidi Christensen
Jon Barker
Zoran Cvetkovic
Abstract
Raw waveform acoustic modelling has recently received increasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentially include all task-relevant information. In the context of automatic dysarthric speech recognition (ADSR), raw waveform acoustic modelling is under-explored owing to data scarcity. Parametric convolutional neural networks (CNNs) can compensate for this problem due to having notably fewer parameters and requiring less training data in comparison with conventional non-parametric CNNs. In this paper, we explore the usefulness of raw waveform acoustic modelling using various parametric CNNs for ADSR. We investigate the properties of the learned filters and monitor the training dynamics of various models. Furthermore, we study the effectiveness of data augmentation and multi-stream acoustic modelling through combining the non-parametric and parametric CNNs fed by hand-crafted and raw waveform features. Experimental results on the TORGO dysarthric database show that the parametric CNNs significantly outperform the non-parametric CNNs, reaching up to 36.2% and 12.6% WERs (up to 3.4% and 1.1% absolute error reduction) for dysarthric and typical speech, respectively. Multi-stream acoustic modelling further improves the performance resulting in up to 33.2% and 10.3% WERs for dysarthric and typical speech, respectively.
Citation
Yue, Z., Loweimi, E., Christensen, H., Barker, J., & Cvetkovic, Z. (2022, September). Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs. Paper presented at Interspeech 2022, Incheon, Korea
Presentation Conference Type | Conference Paper (unpublished) |
---|---|
Conference Name | Interspeech 2022 |
Start Date | Sep 18, 2022 |
End Date | Sep 22, 2022 |
Deposit Date | Apr 3, 2024 |
DOI | https://doi.org/10.21437/interspeech.2022-163 |
Public URL | http://researchrepository.napier.ac.uk/Output/3585813 |
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search