Skip to main content

Research Repository

Advanced Search

Windowed Attention Mechanisms for Speech Recognition

Zhang, Shucong; Loweimi, Erfan; Bell, Peter; Renals, Steve

Authors

Shucong Zhang

Erfan Loweimi

Peter Bell

Steve Renals



Abstract

The usual attention mechanisms used for encoder-decoder models do not constrain the relationship between input and output sequences to be monotonic. To address this we explore windowed attention mechanisms which restrict attention to a block of source hidden states. Rule-based windowing restricts attention to a (typically large) fixed-length window. The performance of such methods is poor if the window size is small. In this paper, we propose a fully-trainable windowed attention and provide a detailed analysis on the factors which affect the performance of such an attention mechanism. Compared to the rule-based window methods, the learned window size is significantly smaller yet the model's performance is competitive. On the TIMIT corpus this approach has resulted in a 17% (relative) performance improvement over the traditional attention model. Our model also yields comparable accuracies to the joint CTC-attention model on the Wall Street Journal corpus.

Presentation Conference Type Conference Paper (Published)
Conference Name ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Start Date May 12, 2019
End Date May 17, 2019
Online Publication Date Apr 17, 2019
Publication Date 2019
Deposit Date Apr 3, 2024
Publisher Institute of Electrical and Electronics Engineers
Series ISSN 2379-190X
Book Title ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI https://doi.org/10.1109/icassp.2019.8682224
Public URL http://researchrepository.napier.ac.uk/Output/3585909