Skip to main content

Research Repository

Advanced Search

Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Zhang, Xuejie; Xu, Yan; Abel, Andrew K.; Smith, Leslie S.; Watt, Roger; Hussain, Amir; Gao, Chengxiang

Authors

Xuejie Zhang

Yan Xu

Andrew K. Abel

Leslie S. Smith

Roger Watt

Chengxiang Gao



Abstract

Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature

Journal Article Type Article
Acceptance Date Nov 23, 2020
Online Publication Date Dec 3, 2020
Publication Date 2020-12
Deposit Date Dec 7, 2020
Publicly Available Date Dec 7, 2020
Journal Entropy
Publisher MDPI
Peer Reviewed Peer Reviewed
Volume 22
Issue 12
Article Number 1367
DOI https://doi.org/10.3390/e22121367
Keywords speech recognition; image processing; gabor features; lip reading; explainable
Public URL http://researchrepository.napier.ac.uk/Output/2708846

Files

Visual Speech Recognition With Lightweight Psychologically Motivated Gabor Features (2.5 Mb)
PDF

Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/

Copyright Statement
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.




You might also like



Downloadable Citations