Skip to main content

Research Repository

Advanced Search

All Outputs (24)

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Conference Proceeding
Yue, Z., Loweimi, E., & Cvetkovic, Z. (2023). Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra. In Proc. INTERSPEECH 2023 (1533-1537). https://doi.org/10.21437/interspeech.2023-222

In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along... Read More about Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra.

Raw Source and Filter Modelling for Dysarthric Speech Recognition (2022)
Conference Proceeding
Yue, Z., Loweimi, E., & Cvetkovic, Z. (2022). Raw Source and Filter Modelling for Dysarthric Speech Recognition. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp43922.2022.9746553

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between the typical and dysarthric speech complicates transfer learning. In this paper, we bui... Read More about Raw Source and Filter Modelling for Dysarthric Speech Recognition.

Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition (2022)
Conference Proceeding
Yue, Z., Loweimi, E., Cvetkovic, Z., Christensen, H., & Barker, J. (2022). Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp43922.2022.9746855

Building automatic speech recognition (ASR) systems for speakers with dysarthria is a very challenging task. Although multi-modal ASR has received increasing attention recently, incorporating real articulatory data with acoustic features has not been... Read More about Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition.

Speech Acoustic Modelling Using Raw Source and Filter Components (2021)
Conference Proceeding
Loweimi, E., Cvetkovic, Z., Bell, P., & Renals, S. (2021). Speech Acoustic Modelling Using Raw Source and Filter Components. In Proc. Interspeech 2021 (276-280). https://doi.org/10.21437/interspeech.2021-53

Source-filter modelling is among the fundamental techniques in speech processing with a wide range of applications. In acoustic modelling, features such as MFCC and PLP which parametrise the filter component are widely employed. In this paper, we inv... Read More about Speech Acoustic Modelling Using Raw Source and Filter Components.

Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models (2021)
Conference Proceeding
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2021). Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models. In Proc. Interspeech 2021 (2541-2545). https://doi.org/10.21437/interspeech.2021-280

Recently, Transformer based models have shown competitive automatic speech recognition (ASR) performance. One key factor in the success of these models is the multi-head attention mechanism. However, for trained models, we have previously observed th... Read More about Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.

Speech Acoustic Modelling from Raw Phase Spectrum (2021)
Conference Proceeding
Loweimi, E., Cvetkovic, Z., Bell, P., & Renals, S. (2021). Speech Acoustic Modelling from Raw Phase Spectrum. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp39728.2021.9413727

Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time... Read More about Speech Acoustic Modelling from Raw Phase Spectrum.

Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers (2021)
Conference Proceeding
Zhang, S., Do, C., Doddipatla, R., Loweimi, E., Bell, P., & Renals, S. (2021). Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp39728.2021.9413565

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset. That is, in general, freezing the trained feature extractor (the lower layers) and re... Read More about Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.

On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers (2021)
Conference Proceeding
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2021). On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. In 2021 IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/slt48900.2021.9383521

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results. However, we note the range of the learned context increases... Read More about On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers.

Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling (2020)
Conference Proceeding
Loweimi, E., Bell, P., & Renals, S. (2020). Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling. In Proc. Interspeech 2020 (1644-1648). https://doi.org/10.21437/interspeech.2020-18

In this paper we investigate the usefulness of the sign spectrum and its combination with the raw magnitude spectrum in acoustic modelling for automatic speech recognition (ASR). The sign spectrum is a sequence of ±1s, capturing one bit of the phase... Read More about Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.

On the Robustness and Training Dynamics of Raw Waveform Models (2020)
Conference Proceeding
Loweimi, E., Bell, P., & Renals, S. (2020). On the Robustness and Training Dynamics of Raw Waveform Models. In Proc. Interspeech 2020 (1001-1005). https://doi.org/10.21437/interspeech.2020-17

We investigate the robustness and training dynamics of raw waveform acoustic models for automatic speech recognition (ASR). It is known that the first layer of such models learn a set of filters, performing a form of time-frequency analysis. This lay... Read More about On the Robustness and Training Dynamics of Raw Waveform Models.

Acoustic Model Adaptation from Raw Waveforms with Sincnet (2019)
Conference Proceeding
Fainberg, J., Klejch, O., Loweimi, E., Bell, P., & Renals, S. (2019). Acoustic Model Adaptation from Raw Waveforms with Sincnet. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru46091.2019.9003974

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed... Read More about Acoustic Model Adaptation from Raw Waveforms with Sincnet.

Trainable Dynamic Subsampling for End-to-End Speech Recognition (2019)
Conference Proceeding
Zhang, S., Loweimi, E., Xu, Y., Bell, P., & Renals, S. (2019). Trainable Dynamic Subsampling for End-to-End Speech Recognition. In Proc. Interspeech 2019 (1413-1417). https://doi.org/10.21437/interspeech.2019-2778

Jointly optimised attention-based encoder-decoder models have yielded impressive speech recognition results. The recurrent neural network (RNN) encoder is a key component in such models — it learns the hidden representations of the inputs. However, i... Read More about Trainable Dynamic Subsampling for End-to-End Speech Recognition.

Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition (2019)
Conference Proceeding
Jalal, M. A., Loweimi, E., Moore, R. K., & Hain, T. (2019). Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. In Proc. Interspeech 2019 (1701-1705). https://doi.org/10.21437/interspeech.2019-3068

Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maxi... Read More about Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition.

On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters (2019)
Conference Proceeding
Loweimi, E., Bell, P., & Renals, S. (2019). On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters. In Proc. Interspeech 2019 (3480-3484). https://doi.org/10.21437/interspeech.2019-1257

We investigate the problem of direct waveform modelling using parametric kernel-based filters in a convolutional neural network (CNN) framework, building on SincNet, a CNN employing the cardinal sine (sinc) function to implement learnable bandpass fi... Read More about On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters.

On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition (2019)
Conference Proceeding
Loweimi, E., Bell, P., & Renals, S. (2019). On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2019.8683330

DNNs play a major role in the state-of-the-art ASR systems. They can be used for extracting features and building probabilistic models for acoustic and language modelling. Despite their huge practical success, the level of theoretical understanding h... Read More about On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition.

Windowed Attention Mechanisms for Speech Recognition (2019)
Conference Proceeding
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2019). Windowed Attention Mechanisms for Speech Recognition. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2019.8682224

The usual attention mechanisms used for encoder-decoder models do not constrain the relationship between input and output sequences to be monotonic. To address this we explore windowed attention mechanisms which restrict attention to a block of sourc... Read More about Windowed Attention Mechanisms for Speech Recognition.

Exploring the Use of Group Delay for Generalised VTS Based Noise Compensation (2018)
Conference Proceeding
Loweimi, E., Barker, J., & Hain, T. (2018). Exploring the Use of Group Delay for Generalised VTS Based Noise Compensation. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2018.8462595

In earlier work we studied the effect of statistical normalisation for phase-based features and observed it leads to a significant robustness improvement. This paper explores the extension of the generalised Vector Taylor Series (gVTS) noise compensa... Read More about Exploring the Use of Group Delay for Generalised VTS Based Noise Compensation.

On the Usefulness of the Speech Phase Spectrum for Pitch Extraction (2018)
Conference Proceeding
Loweimi, E., Barker, J., & Hain, T. (2018). On the Usefulness of the Speech Phase Spectrum for Pitch Extraction. In Proc. Interspeech 2018 (696-700). https://doi.org/10.21437/interspeech.2018-1062

Most frequency domain techniques for pitch extraction such as cepstrum, harmonic product spectrum (HPS) and summation residual harmonics (SRH) operate on the magnitude spectrum and turn it into a function in which the fundamental frequency emerges as... Read More about On the Usefulness of the Speech Phase Spectrum for Pitch Extraction.

Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR (2017)
Conference Proceeding
Loweimi, E., Barker, J., & Hain, T. (2017). Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR. In Proc. Interspeech 2017 (2466-2470). https://doi.org/10.21437/interspeech.2017-211

Vector Taylor Series (VTS) is a powerful technique for robust ASR but, in its standard form, it can only be applied to log-filter bank and MFCC features. In earlier work, we presented a generalised VTS (gVTS) that extends the applicability of VTS to... Read More about Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR.

Robust Source-Filter Separation of Speech Signal in the Phase Domain (2017)
Conference Proceeding
Loweimi, E., Barker, J., Torralba, O. S., & Hain, T. (2017). Robust Source-Filter Separation of Speech Signal in the Phase Domain. In Proc. Interspeech 2017 (414-418). https://doi.org/10.21437/interspeech.2017-210

In earlier work we proposed a framework for speech source-filter separation that employs phase-based signal processing. This paper presents a further theoretical investigation of the model and optimisations that make the filter and source representat... Read More about Robust Source-Filter Separation of Speech Signal in the Phase Domain.