Erfan Loweimi
Robust Source-Filter Separation of Speech Signal in the Phase Domain
Loweimi, Erfan; Barker, Jon; Torralba, Oscar Saz; Hain, Thomas
Authors
Jon Barker
Oscar Saz Torralba
Thomas Hain
Abstract
In earlier work we proposed a framework for speech source-filter separation that employs phase-based signal processing. This paper presents a further theoretical investigation of the model and optimisations that make the filter and source representations less sensitive to the effects of noise and better matched to downstream processing. To this end, first, in computing the Hilbert transform, the log function is replaced by the generalised logarithmic function. This introduces a tuning parameter that adjusts both the dynamic range and distribution of the phase-based representation. Second, when computing the group delay, a more robust estimate for the derivative is formed by applying a regression filter instead of using sample differences. The effectiveness of these modifications is evaluated in clean and noisy conditions by considering the accuracy of the fundamental frequency extracted from the estimated source, and the performance of speech recognition features extracted from the estimated filter. In particular, the proposed filter-based front-end reduces Aurora-2 WERs by 6.3% (average 0–20 dB) compared with previously reported results. Furthermore, when tested in a LVCSR task (Aurora-4) the new features resulted in 5.8% absolute WER reduction compared to MFCCs without performance loss in the clean/matched condition.
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | Interspeech 2017 |
Start Date | Aug 20, 2017 |
End Date | Aug 24, 2017 |
Online Publication Date | Aug 20, 2017 |
Publication Date | 2017 |
Deposit Date | Apr 4, 2024 |
Pages | 414-418 |
Book Title | Proc. Interspeech 2017 |
DOI | https://doi.org/10.21437/interspeech.2017-210 |
Public URL | http://researchrepository.napier.ac.uk/Output/3586530 |
You might also like
Phonetic Error Analysis Beyond Phone Error Rate
(2023)
Journal Article
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
(2023)
Journal Article
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition
(2022)
Journal Article
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra
(2023)
Presentation / Conference Contribution
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
(2022)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search