Robust Source-Filter Separation of Speech Signal in the Phase Domain

Loweimi, Erfan; Barker, Jon; Torralba, Oscar Saz; Hain, Thomas

doi:10.21437/interspeech.2017-210

Robust Source-Filter Separation of Speech Signal in the Phase Domain

Loweimi, Erfan; Barker, Jon; Torralba, Oscar Saz; Hain, Thomas

Authors

Erfan Loweimi

Jon Barker

Oscar Saz Torralba

Thomas Hain

Abstract

In earlier work we proposed a framework for speech source-filter separation that employs phase-based signal processing. This paper presents a further theoretical investigation of the model and optimisations that make the filter and source representations less sensitive to the effects of noise and better matched to downstream processing. To this end, first, in computing the Hilbert transform, the log function is replaced by the generalised logarithmic function. This introduces a tuning parameter that adjusts both the dynamic range and distribution of the phase-based representation. Second, when computing the group delay, a more robust estimate for the derivative is formed by applying a regression filter instead of using sample differences. The effectiveness of these modifications is evaluated in clean and noisy conditions by considering the accuracy of the fundamental frequency extracted from the estimated source, and the performance of speech recognition features extracted from the estimated filter. In particular, the proposed filter-based front-end reduces Aurora-2 WERs by 6.3% (average 0–20 dB) compared with previously reported results. Furthermore, when tested in a LVCSR task (Aurora-4) the new features resulted in 5.8% absolute WER reduction compared to MFCCs without performance loss in the clean/matched condition.

Presentation Conference Type	Conference Paper (Published)
Conference Name	Interspeech 2017
Start Date	Aug 20, 2017
End Date	Aug 24, 2017
Online Publication Date	Aug 20, 2017
Publication Date	2017
Deposit Date	Apr 4, 2024
Pages	414-418
Book Title	Proc. Interspeech 2017
DOI	https://doi.org/10.21437/interspeech.2017-210
Public URL	http://researchrepository.napier.ac.uk/Output/3586530

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition (2022)
Journal Article

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

You might also like

Downloadable Citations