Shucong Zhang
Windowed Attention Mechanisms for Speech Recognition
Zhang, Shucong; Loweimi, Erfan; Bell, Peter; Renals, Steve
Authors
Erfan Loweimi
Peter Bell
Steve Renals
Abstract
The usual attention mechanisms used for encoder-decoder models do not constrain the relationship between input and output sequences to be monotonic. To address this we explore windowed attention mechanisms which restrict attention to a block of source hidden states. Rule-based windowing restricts attention to a (typically large) fixed-length window. The performance of such methods is poor if the window size is small. In this paper, we propose a fully-trainable windowed attention and provide a detailed analysis on the factors which affect the performance of such an attention mechanism. Compared to the rule-based window methods, the learned window size is significantly smaller yet the model's performance is competitive. On the TIMIT corpus this approach has resulted in a 17% (relative) performance improvement over the traditional attention model. Our model also yields comparable accuracies to the joint CTC-attention model on the Wall Street Journal corpus.
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Start Date | May 12, 2019 |
End Date | May 17, 2019 |
Online Publication Date | Apr 17, 2019 |
Publication Date | 2019 |
Deposit Date | Apr 3, 2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Series ISSN | 2379-190X |
Book Title | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
DOI | https://doi.org/10.1109/icassp.2019.8682224 |
Public URL | http://researchrepository.napier.ac.uk/Output/3585909 |
You might also like
Phonetic Error Analysis Beyond Phone Error Rate
(2023)
Journal Article
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
(2023)
Journal Article
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition
(2022)
Journal Article
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra
(2023)
Presentation / Conference Contribution
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
(2022)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search