Erfan Loweimi
Speech Acoustic Modelling from Raw Phase Spectrum
Loweimi, Erfan; Cvetkovic, Zoran; Bell, Peter; Renals, Steve
Authors
Zoran Cvetkovic
Peter Bell
Steve Renals
Abstract
Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time phase spectrum. In particular, we study the usefulness of the raw wrapped, unwrapped and minimum-phase phase spectra as well as the phase of the source and filter components for acoustic modelling. Furthermore, we explore the effectiveness of simultaneous deployment of the vocal tract and excitation components of the raw phase spectrum using multi-head CNNs and investigate multiple information fusion schemes. This paves the way for developing an effective phase-based multi-stream information processing systems for speech recognition. The performance, even for wrapped phase with a noise-like shape, is comparable to or better than the magnitude-based classic features, and up to 4.8% WER has been achieved in the WSJ (Eval-92) task.
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Start Date | Jun 6, 2021 |
End Date | Jun 11, 2021 |
Online Publication Date | May 13, 2021 |
Publication Date | 2021 |
Deposit Date | Apr 3, 2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Series ISSN | 2379-190X |
Book Title | ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
DOI | https://doi.org/10.1109/icassp39728.2021.9413727 |
Keywords | Raw phase spectrum, phase-based source-filter separation, multi-head CNNs, acoustic modelling, ASR |
Public URL | http://researchrepository.napier.ac.uk/Output/3585849 |
You might also like
Phonetic Error Analysis Beyond Phone Error Rate
(2023)
Journal Article
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
(2023)
Journal Article
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition
(2022)
Journal Article
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra
(2023)
Presentation / Conference Contribution
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
(2022)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search