Skip to main content

Research Repository

Advanced Search

Speech Acoustic Modelling Using Raw Source and Filter Components

Loweimi, Erfan; Cvetkovic, Zoran; Bell, Peter; Renals, Steve

Authors

Erfan Loweimi

Zoran Cvetkovic

Peter Bell

Steve Renals



Abstract

Source-filter modelling is among the fundamental techniques in speech processing with a wide range of applications. In acoustic modelling, features such as MFCC and PLP which parametrise the filter component are widely employed. In this paper, we investigate the efficacy of building acoustic models from the raw filter and source components. The raw magnitude spectrum, as the primary information stream, is decomposed into the excitation and vocal tract information streams via cepstral liftering. Then, acoustic models are built via multi-head CNNs which, among others, allow for processing each individual stream via a sequence of bespoke transforms and fusing them at an optimal level of abstraction. We discuss the possible advantages of such information factorisation and recombination, investigate the dynamics of these models and explore the optimal fusion level. Furthermore, we illustrate the CNN’s learned filters and provide some interpretation for the captured patterns. The proposed approach with optimal fusion scheme results in up to 14% and 7% relative WER reduction in WSJ and Aurora-4 tasks.

Presentation Conference Type Conference Paper (Published)
Conference Name Interspeech 2021
Start Date Aug 30, 2021
End Date Sep 3, 2021
Online Publication Date Aug 30, 2021
Publication Date 2021
Deposit Date Apr 3, 2024
Pages 276-280
Book Title Proc. Interspeech 2021
DOI https://doi.org/10.21437/interspeech.2021-53
Public URL http://researchrepository.napier.ac.uk/Output/3585837