Xuejie Zhang
Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
Zhang, Xuejie; Xu, Yan; Abel, Andrew K.; Smith, Leslie S.; Watt, Roger; Hussain, Amir; Gao, Chengxiang
Authors
Yan Xu
Andrew K. Abel
Leslie S. Smith
Roger Watt
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Chengxiang Gao
Abstract
Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature
Journal Article Type | Article |
---|---|
Acceptance Date | Nov 23, 2020 |
Online Publication Date | Dec 3, 2020 |
Publication Date | 2020-12 |
Deposit Date | Dec 7, 2020 |
Publicly Available Date | Dec 7, 2020 |
Journal | Entropy |
Publisher | MDPI |
Peer Reviewed | Peer Reviewed |
Volume | 22 |
Issue | 12 |
Article Number | 1367 |
DOI | https://doi.org/10.3390/e22121367 |
Keywords | speech recognition; image processing; gabor features; lip reading; explainable |
Public URL | http://researchrepository.napier.ac.uk/Output/2708846 |
Files
Visual Speech Recognition With Lightweight Psychologically Motivated Gabor Features
(2.5 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
You might also like
Applications of Deep Learning and Reinforcement Learning to Biological Data
(2018)
Journal Article
Guided Policy Search for Sequential Multitask Learning
(2018)
Journal Article
Learning Latent Features With Infinite Nonnegative Binary Matrix Trifactorization
(2018)
Journal Article
Cross-modality interactive attention network for multispectral pedestrian detection
(2018)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search