Skip to main content

Research Repository

Advanced Search

Phonetic Error Analysis Beyond Phone Error Rate

Loweimi, Erfan; Carmantini, Andrea; Bell, Peter; Renals, Steve; Cvetkovic, Zoran

Authors

Erfan Loweimi

Andrea Carmantini

Peter Bell

Steve Renals

Zoran Cvetkovic



Abstract

In this article, we analyse the performance of the TIMIT-based phone recognition systems beyond the overall phone error rate (PER) metric. We consider three broad phonetic classes (BPCs): {affricate, diphthong, fricative, nasal, plosive, semi-vowel, vowel, silence}, {consonant, vowel, silence} and {voiced, unvoiced, silence} and, calculate the contribution of each phonetic class in terms of the substitution, deletion, insertion and PER. Furthermore, for each BPC we investigate the following: evolution of PER during training, effect of noise (NTIMIT), importance of different spectral subbands (1, 2, 4, and 8 kHz), usefulness of bidirectional vs unidirectional sequential modelling, transfer learning from WSJ and regularisation via monophones. In addition, we construct a confusion matrix for each BPC and analyse the confusions via dimensionality reduction to 2D at the input (acoustic features) and output (logits) levels of the acoustic model. We also compare the performance and confusion matrices of the BLSTM-based hybrid baseline system with those of the GMM-HMM based hybrid, Conformer and wav2vec 2.0 based end-to-end phone recognisers. Finally, the relationship of the unweighted and weighted PERs with the broad phonetic class priors is studied for both the hybrid and end-to-end systems.

Citation

Loweimi, E., Carmantini, A., Bell, P., Renals, S., & Cvetkovic, Z. (2023). Phonetic Error Analysis Beyond Phone Error Rate. IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, 3346-3361. https://doi.org/10.1109/taslp.2023.3313417

Journal Article Type Article
Online Publication Date Sep 8, 2023
Publication Date 2023
Deposit Date Apr 3, 2024
Print ISSN 2329-9290
Electronic ISSN 2329-9304
Publisher Institute of Electrical and Electronics Engineers
Peer Reviewed Peer Reviewed
Volume 31
Pages 3346-3361
DOI https://doi.org/10.1109/taslp.2023.3313417
Keywords Phone recognition, TIMIT, phonetic error analysis, broad phonetic classes, confusion matrix, hybrid, end-to-end
Public URL http://researchrepository.napier.ac.uk/Output/3585785


Downloadable Citations