Speech Acoustic Modelling Using Raw Source and Filter Components

Loweimi, Erfan; Cvetkovic, Zoran; Bell, Peter; Renals, Steve

doi:10.21437/interspeech.2021-53

Speech Acoustic Modelling Using Raw Source and Filter Components

Loweimi, Erfan; Cvetkovic, Zoran; Bell, Peter; Renals, Steve

Authors

Erfan Loweimi

Zoran Cvetkovic

Peter Bell

Steve Renals

Abstract

Source-filter modelling is among the fundamental techniques in speech processing with a wide range of applications. In acoustic modelling, features such as MFCC and PLP which parametrise the filter component are widely employed. In this paper, we investigate the efficacy of building acoustic models from the raw filter and source components. The raw magnitude spectrum, as the primary information stream, is decomposed into the excitation and vocal tract information streams via cepstral liftering. Then, acoustic models are built via multi-head CNNs which, among others, allow for processing each individual stream via a sequence of bespoke transforms and fusing them at an optimal level of abstraction. We discuss the possible advantages of such information factorisation and recombination, investigate the dynamics of these models and explore the optimal fusion level. Furthermore, we illustrate the CNN’s learned filters and provide some interpretation for the captured patterns. The proposed approach with optimal fusion scheme results in up to 14% and 7% relative WER reduction in WSJ and Aurora-4 tasks.

Presentation Conference Type	Conference Paper (Published)
Conference Name	Interspeech 2021
Start Date	Aug 30, 2021
End Date	Sep 3, 2021
Online Publication Date	Aug 30, 2021
Publication Date	2021
Deposit Date	Apr 3, 2024
Pages	276-280
Book Title	Proc. Interspeech 2021
DOI	https://doi.org/10.21437/interspeech.2021-53
Public URL	http://researchrepository.napier.ac.uk/Output/3585837

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition (2022)
Journal Article

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

You might also like

Downloadable Citations