Skip to main content

Research Repository

Advanced Search

Outputs (29)

Speech Acoustic Modelling from Raw Phase Spectrum (2021)
Presentation / Conference Contribution
Loweimi, E., Cvetkovic, Z., Bell, P., & Renals, S. (2021). Speech Acoustic Modelling from Raw Phase Spectrum. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp39728.202

Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time... Read More about Speech Acoustic Modelling from Raw Phase Spectrum.

Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers (2021)
Presentation / Conference Contribution
Zhang, S., Do, C., Doddipatla, R., Loweimi, E., Bell, P., & Renals, S. (2021). Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and S

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset. That is, in general, freezing the trained feature extractor (the lower layers) and re... Read More about Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.

On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers (2021)
Presentation / Conference Contribution
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2021). On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. In 2021 IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/slt48900.2021.9383521

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results. However, we note the range of the learned context increases... Read More about On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers.

On the Robustness and Training Dynamics of Raw Waveform Models (2020)
Presentation / Conference Contribution
Loweimi, E., Bell, P., & Renals, S. (2020). On the Robustness and Training Dynamics of Raw Waveform Models. In Proc. Interspeech 2020 (1001-1005). https://doi.org/10.21437/interspeech.2020-17

We investigate the robustness and training dynamics of raw waveform acoustic models for automatic speech recognition (ASR). It is known that the first layer of such models learn a set of filters, performing a form of time-frequency analysis. This lay... Read More about On the Robustness and Training Dynamics of Raw Waveform Models.

Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling (2020)
Presentation / Conference Contribution
Loweimi, E., Bell, P., & Renals, S. (2020). Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling. In Proc. Interspeech 2020 (1644-1648). https://doi.org/10.21437/interspeech.2020-18

In this paper we investigate the usefulness of the sign spectrum and its combination with the raw magnitude spectrum in acoustic modelling for automatic speech recognition (ASR). The sign spectrum is a sequence of ±1s, capturing one bit of the phase... Read More about Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.

Acoustic Model Adaptation from Raw Waveforms with Sincnet (2019)
Presentation / Conference Contribution
Fainberg, J., Klejch, O., Loweimi, E., Bell, P., & Renals, S. (2019). Acoustic Model Adaptation from Raw Waveforms with Sincnet. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru46091.2019.9003974

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed... Read More about Acoustic Model Adaptation from Raw Waveforms with Sincnet.

On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters (2019)
Presentation / Conference Contribution
Loweimi, E., Bell, P., & Renals, S. (2019). On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters. In Proc. Interspeech 2019 (3480-3484). https://doi.org/10.21437/interspeech.2019-1257

We investigate the problem of direct waveform modelling using parametric kernel-based filters in a convolutional neural network (CNN) framework, building on SincNet, a CNN employing the cardinal sine (sinc) function to implement learnable bandpass fi... Read More about On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters.

Trainable Dynamic Subsampling for End-to-End Speech Recognition (2019)
Presentation / Conference Contribution
Zhang, S., Loweimi, E., Xu, Y., Bell, P., & Renals, S. (2019). Trainable Dynamic Subsampling for End-to-End Speech Recognition. In Proc. Interspeech 2019 (1413-1417). https://doi.org/10.21437/interspeech.2019-2778

Jointly optimised attention-based encoder-decoder models have yielded impressive speech recognition results. The recurrent neural network (RNN) encoder is a key component in such models — it learns the hidden representations of the inputs. However, i... Read More about Trainable Dynamic Subsampling for End-to-End Speech Recognition.

Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition (2019)
Presentation / Conference Contribution
Jalal, M. A., Loweimi, E., Moore, R. K., & Hain, T. (2019). Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. In Proc. Interspeech 2019 (1701-1705). https://doi.org/10.21437/interspeech.2019-3068

Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maxi... Read More about Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition.

On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition (2019)
Presentation / Conference Contribution
Loweimi, E., Bell, P., & Renals, S. (2019). On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi

DNNs play a major role in the state-of-the-art ASR systems. They can be used for extracting features and building probabilistic models for acoustic and language modelling. Despite their huge practical success, the level of theoretical understanding h... Read More about On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition.