School of Computing Engineering and the Built Environment

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article
Loweimi, E., Carmantini, A., Bell, P., Renals, S., & Cvetkovic, Z. (2023). Phonetic Error Analysis Beyond Phone Error Rate. IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, 3346-3361. https://doi.org/10.1109/taslp.2023.3313417

In this article, we analyse the performance of the TIMIT-based phone recognition systems beyond the overall phone error rate (PER) metric. We consider three broad phonetic classes (BPCs): {affricate, diphthong, fricative, nasal, plosive, semi-vowel,... Read More about Phonetic Error Analysis Beyond Phone Error Rate.

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution
Yue, Z., Loweimi, E., & Cvetkovic, Z. (2023). Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra. In Proc. INTERSPEECH 2023 (1533-1537). https://doi.org/10.21437/interspeech.2023-222

In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along... Read More about Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra.

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article
Loweimi, E., Yue, Z., Bell, P., Renals, S., & Cvetkovic, Z. (2023). Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform. IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, 876-890. https://doi.org/1

In this paper, we investigate multi-stream acoustic modelling using the raw real and imaginary parts of the Fourier transform of speech signals. Using the raw magnitude spectrum, or features derived from it, as a proxy for the real and imaginary part... Read More about Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform.

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition (2022)
Journal Article
Yue, Z., Loweimi, E., Christensen, H., Barker, J., & Cvetkovic, Z. (2022). Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing, 30, 2968-2980. https://d

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between typical and dysarthric speech complicate the transfer learning. In this paper, we aim... Read More about Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution
Yue, Z., Loweimi, E., Christensen, H., Barker, J., & Cvetkovic, Z. (2022, September). Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs. Paper presented at Interspeech 2022, Incheon, Korea

Raw waveform acoustic modelling has recently received increasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentia... Read More about Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.

RCT: Random consistency training for semi-supervised sound event detection (2022)
Presentation / Conference Contribution
Shao, N., Loweimi, E., & Li, X. (2022, September). RCT: Random consistency training for semi-supervised sound event detection. Paper presented at Interspeech 2022, Incheon, Korea

Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem. This paper researches on several core mod... Read More about RCT: Random consistency training for semi-supervised sound event detection.

Raw Source and Filter Modelling for Dysarthric Speech Recognition (2022)
Presentation / Conference Contribution
Yue, Z., Loweimi, E., & Cvetkovic, Z. (2022). Raw Source and Filter Modelling for Dysarthric Speech Recognition. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp43922.

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between the typical and dysarthric speech complicates transfer learning. In this paper, we bui... Read More about Raw Source and Filter Modelling for Dysarthric Speech Recognition.

Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition (2022)
Presentation / Conference Contribution
Yue, Z., Loweimi, E., Cvetkovic, Z., Christensen, H., & Barker, J. (2022). Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing

Building automatic speech recognition (ASR) systems for speakers with dysarthria is a very challenging task. Although multi-modal ASR has received increasing attention recently, incorporating real articulatory data with acoustic features has not been... Read More about Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition.

Speech Acoustic Modelling Using Raw Source and Filter Components (2021)
Presentation / Conference Contribution
Loweimi, E., Cvetkovic, Z., Bell, P., & Renals, S. (2021). Speech Acoustic Modelling Using Raw Source and Filter Components. In Proc. Interspeech 2021 (276-280). https://doi.org/10.21437/interspeech.2021-53

Source-filter modelling is among the fundamental techniques in speech processing with a wide range of applications. In acoustic modelling, features such as MFCC and PLP which parametrise the filter component are widely employed. In this paper, we inv... Read More about Speech Acoustic Modelling Using Raw Source and Filter Components.

Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models (2021)
Presentation / Conference Contribution
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2021). Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models. In Proc. Interspeech 2021 (2541-2545). https://doi.org/10.21437/interspeech.2021-280

Recently, Transformer based models have shown competitive automatic speech recognition (ASR) performance. One key factor in the success of these models is the multi-head attention mechanism. However, for trained models, we have previously observed th... Read More about Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.

Outputs (29)