Speech Acoustic Modelling from Raw Phase Spectrum

Loweimi, Erfan; Cvetkovic, Zoran; Bell, Peter; Renals, Steve

doi:10.1109/icassp39728.2021.9413727

Speech Acoustic Modelling from Raw Phase Spectrum

Loweimi, Erfan; Cvetkovic, Zoran; Bell, Peter; Renals, Steve

Authors

Erfan Loweimi

Zoran Cvetkovic

Peter Bell

Steve Renals

Abstract

Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time phase spectrum. In particular, we study the usefulness of the raw wrapped, unwrapped and minimum-phase phase spectra as well as the phase of the source and filter components for acoustic modelling. Furthermore, we explore the effectiveness of simultaneous deployment of the vocal tract and excitation components of the raw phase spectrum using multi-head CNNs and investigate multiple information fusion schemes. This paves the way for developing an effective phase-based multi-stream information processing systems for speech recognition. The performance, even for wrapped phase with a noise-like shape, is comparable to or better than the magnitude-based classic features, and up to 4.8% WER has been achieved in the WSJ (Eval-92) task.

Presentation Conference Type	Conference Paper (Published)
Conference Name	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Start Date	Jun 6, 2021
End Date	Jun 11, 2021
Online Publication Date	May 13, 2021
Publication Date	2021
Deposit Date	Apr 3, 2024
Publisher	Institute of Electrical and Electronics Engineers
Series ISSN	2379-190X
Book Title	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI	https://doi.org/10.1109/icassp39728.2021.9413727
Keywords	Raw phase spectrum, phase-based source-filter separation, multi-head CNNs, acoustic modelling, ASR
Public URL	http://researchrepository.napier.ac.uk/Output/3585849

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition (2022)
Journal Article

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

You might also like

Downloadable Citations