A data driven approach to audiovisual speech mapping

Abel, A.; Marxer, R.; Barker, J.; Watt, R.; Whitmer, B.; Derleth, P.; Hussain, A.

doi:10.1007/978-3-319-49685-6_30

A data driven approach to audiovisual speech mapping

Abel, A.; Marxer, R.; Barker, J.; Watt, R.; Whitmer, B.; Derleth, P.; Hussain, A.

Authors

A. Abel

R. Marxer

J. Barker

R. Watt

B. Whitmer

P. Derleth

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

Citation

Abel, A., Marxer, R., Barker, J., Watt, R., Whitmer, B., Derleth, P., & Hussain, A. (2016, November). A data driven approach to audiovisual speech mapping. Presented at 8th International Conference, BICS 2016, Beijing, China

Presentation Conference Type	Conference Paper (published)
Conference Name	8th International Conference, BICS 2016
Start Date	Nov 28, 2016
End Date	Nov 30, 2016
Online Publication Date	Nov 13, 2016
Publication Date	2016
Deposit Date	Oct 7, 2019
Publisher	Springer
Pages	331-342
Series Title	Lecture Notes in Computer Science
Series Number	10023
Series ISSN	0302-9743
Book Title	Advances in Brain Inspired Cognitive Systems
ISBN	978-3-319-49684-9
DOI	https://doi.org/10.1007/978-3-319-49685-6_30
Public URL	http://researchrepository.napier.ac.uk/Output/1792590

Peeping into the Future: Understanding and Combating Generative AI-Based Fake News (2025)
Journal Article

Multi-scale integration with semantic embedding and adaptive excitation transformer for underwater optical image enhancement (2025)
Journal Article

A Novel Continual Learning and Adaptive Sensing State Response‐Based Target Recognition and Long‐Term Tracking Framework for Smart Industrial Applications (2025)
Journal Article

Arabic Short-text Dataset for Sentiment Analysis of Tourism and Leisure Events (2025)
Journal Article

Privacy-preserving Facial Emotion Classification with Visual Micro-Doppler Signatures for Hearing Aid Applications (2025)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations