Skip to main content

Research Repository

Advanced Search

A data driven approach to audiovisual speech mapping

Abel, A.; Marxer, R.; Barker, J.; Watt, R.; Whitmer, B.; Derleth, P.; Hussain, A.

Authors

A. Abel

R. Marxer

J. Barker

R. Watt

B. Whitmer

P. Derleth



Abstract

The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

Citation

Abel, A., Marxer, R., Barker, J., Watt, R., Whitmer, B., Derleth, P., & Hussain, A. (2016, November). A data driven approach to audiovisual speech mapping. Presented at 8th International Conference, BICS 2016, Beijing, China

Presentation Conference Type Conference Paper (published)
Conference Name 8th International Conference, BICS 2016
Start Date Nov 28, 2016
End Date Nov 30, 2016
Online Publication Date Nov 13, 2016
Publication Date 2016
Deposit Date Oct 7, 2019
Publisher Springer
Pages 331-342
Series Title Lecture Notes in Computer Science
Series Number 10023
Series ISSN 0302-9743
Book Title Advances in Brain Inspired Cognitive Systems
ISBN 978-3-319-49684-9
DOI https://doi.org/10.1007/978-3-319-49685-6_30
Public URL http://researchrepository.napier.ac.uk/Output/1792590