On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

Zhang, Shucong; Loweimi, Erfan; Bell, Peter; Renals, Steve

doi:10.1109/slt48900.2021.9383521

All Outputs (5)

Speech Acoustic Modelling Using Raw Source and Filter Components (2021)
Presentation / Conference Contribution
Loweimi, E., Cvetkovic, Z., Bell, P., & Renals, S. (2021). Speech Acoustic Modelling Using Raw Source and Filter Components. In Proc. Interspeech 2021 (276-280). https://doi.org/10.21437/interspeech.2021-53

Source-filter modelling is among the fundamental techniques in speech processing with a wide range of applications. In acoustic modelling, features such as MFCC and PLP which parametrise the filter component are widely employed. In this paper, we inv... Read More about Speech Acoustic Modelling Using Raw Source and Filter Components.

Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models (2021)
Presentation / Conference Contribution
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2021). Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models. In Proc. Interspeech 2021 (2541-2545). https://doi.org/10.21437/interspeech.2021-280

Recently, Transformer based models have shown competitive automatic speech recognition (ASR) performance. One key factor in the success of these models is the multi-head attention mechanism. However, for trained models, we have previously observed th... Read More about Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.

Speech Acoustic Modelling from Raw Phase Spectrum (2021)
Presentation / Conference Contribution
Loweimi, E., Cvetkovic, Z., Bell, P., & Renals, S. (2021). Speech Acoustic Modelling from Raw Phase Spectrum. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp39728.202

Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time... Read More about Speech Acoustic Modelling from Raw Phase Spectrum.

Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers (2021)
Presentation / Conference Contribution
Zhang, S., Do, C., Doddipatla, R., Loweimi, E., Bell, P., & Renals, S. (2021). Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and S

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset. That is, in general, freezing the trained feature extractor (the lower layers) and re... Read More about Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.

On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers (2021)
Presentation / Conference Contribution
Zhang, S., Loweimi, E., Bell, P., & Renals, S. (2021). On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. In 2021 IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/slt48900.2021.9383521

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results. However, we note the range of the learned context increases... Read More about On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers.