Windowed Attention Mechanisms for Speech Recognition

Zhang, Shucong; Loweimi, Erfan; Bell, Peter; Renals, Steve

doi:10.1109/icassp.2019.8682224

Windowed Attention Mechanisms for Speech Recognition

Zhang, Shucong; Loweimi, Erfan; Bell, Peter; Renals, Steve

Authors

Shucong Zhang

Erfan Loweimi

Peter Bell

Steve Renals

Abstract

The usual attention mechanisms used for encoder-decoder models do not constrain the relationship between input and output sequences to be monotonic. To address this we explore windowed attention mechanisms which restrict attention to a block of source hidden states. Rule-based windowing restricts attention to a (typically large) fixed-length window. The performance of such methods is poor if the window size is small. In this paper, we propose a fully-trainable windowed attention and provide a detailed analysis on the factors which affect the performance of such an attention mechanism. Compared to the rule-based window methods, the learned window size is significantly smaller yet the model's performance is competitive. On the TIMIT corpus this approach has resulted in a 17% (relative) performance improvement over the traditional attention model. Our model also yields comparable accuracies to the joint CTC-attention model on the Wall Street Journal corpus.

Presentation Conference Type	Conference Paper (Published)
Conference Name	ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Start Date	May 12, 2019
End Date	May 17, 2019
Online Publication Date	Apr 17, 2019
Publication Date	2019
Deposit Date	Apr 3, 2024
Publisher	Institute of Electrical and Electronics Engineers
Series ISSN	2379-190X
Book Title	ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI	https://doi.org/10.1109/icassp.2019.8682224
Public URL	http://researchrepository.napier.ac.uk/Output/3585909

Phonetic Error Analysis Beyond Phone Error Rate (2023)
Journal Article

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform (2023)
Journal Article

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition (2022)
Journal Article

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra (2023)
Presentation / Conference Contribution

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs (2022)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

You might also like

Downloadable Citations