On the Usefulness of the Speech Phase Spectrum for Pitch Extraction

Loweimi, Erfan; Barker, Jon; Hain, Thomas

doi:10.21437/interspeech.2018-1062

On the Usefulness of the Speech Phase Spectrum for Pitch Extraction

Loweimi, Erfan; Barker, Jon; Hain, Thomas

Authors

Erfan Loweimi

Jon Barker

Thomas Hain

Abstract

Most frequency domain techniques for pitch extraction such as cepstrum, harmonic product spectrum (HPS) and summation residual harmonics (SRH) operate on the magnitude spectrum and turn it into a function in which the fundamental frequency emerges as argmax. In this paper, we investigate the extension of these three techniques to the phase and group delay (GD) domains. Our extensions exploit the observation that the bin at which F (magnitude) becomes maximum, for some monotonically increasing function F, is equivalent to bin at which F (phase) has maximum negative slope and F (group delay) has the maximum value. To extract the pitch track from speech phase spectrum, these techniques were coupled with the source-filter model in the phase domain that we proposed in earlier publications and a novel voicing detection algorithm proposed here. The accuracy and robustness of the phase-based pitch extraction techniques are illustrated and compared with their magnitude-based counterparts using six pitch evaluation metrics. On average, it is observed that the phase spectrum can be successfully employed in pitch tracking with comparable accuracy and robustness to the speech magnitude spectrum.

Presentation Conference Type	Conference Paper (Published)
Conference Name	Interspeech 2018
Start Date	Sep 2, 2018
End Date	Sep 6, 2018
Online Publication Date	Sep 2, 2018
Publication Date	2018
Deposit Date	Apr 4, 2024
Pages	696-700
Book Title	Proc. Interspeech 2018
DOI	https://doi.org/10.21437/interspeech.2018-1062
Public URL	http://researchrepository.napier.ac.uk/Output/3586520

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition (2022)
Journal Article

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

You might also like

Downloadable Citations