Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition

Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic, Zoran

doi:10.1109/taslp.2022.3205766

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition

Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic, Zoran

Authors

Zhengjun Yue

Erfan Loweimi

Heidi Christensen

Jon Barker

Zoran Cvetkovic

Abstract

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between typical and dysarthric speech complicate the transfer learning. In this paper, we aim at building acoustic models using the raw magnitude spectra of the source and filter components for ADSR. The proposed multi-stream models consist of convolutional, recurrent and fully-connected layers allowing for pre-processing various information streams and fusing them at an optimal level of abstraction. We demonstrate that such a multi-stream processing leverages information encoded in the vocal tract and excitation components and leads to normalising nuisance factors such as speaker attributes and speaking style. This leads to a better handling of dysarthric speech that exhibits large inter- and intra-speaker variabilities and results in a notable performance gain. Furthermore, we analyse the learned convolutional filters and visualise the outputs of different layers after dimensionality reduction to demonstrate how the speaker-related attributes are normalised along the pipeline. We also compare the proposed multi-stream model with various systems based on MFCC, FBank, raw waveform and i-vector, and, study the training dynamics as well as usefulness of the feature normalisation and data augmentation via speed perturbation. On the widely used TORGO and UASpeech dysarthric speech corpora, the proposed approach leads to a competitive performance of up to 35.3% and 30.3% WERs for dysarthric speech, respectively.

Journal Article Type	Article
Online Publication Date	Sep 23, 2022
Publication Date	2022
Deposit Date	Apr 3, 2024
Print ISSN	2329-9290
Electronic ISSN	2329-9304
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Volume	30
Pages	2968-2980
DOI	https://doi.org/10.1109/taslp.2022.3205766
Keywords	Dysarthric automatic speech recognition, multi-stream acoustic modelling, source-filter separation and fusion
Public URL	http://researchrepository.napier.ac.uk/Output/3585801

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution

RCT: Random consistency training for semi-supervised sound event detection (2022)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

You might also like

Downloadable Citations