Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System.

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

doi:10.21437/interspeech.2020-2935

Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System.

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Authors

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

In this paper, we present VIsual Speech In real nOisy eNvironments (VISION), a first of its kind audio-visual (AV) corpus comprising 2500 utterances from 209 speakers, recorded in real noisy environments including social gatherings, streets, cafeterias and restaurants. While a number of speech enhancement frameworks have been proposed in the literature that exploit AV cues, there are no visual speech corpora recorded in real environments with a sufficient variety of speakers, to enable evaluation of AV frameworks' generalisation capability in a wide range of background visual and acoustic noises. The main purpose of our AV corpus is to foster research in the area of AV signal processing and to provide a benchmark corpus that can be used for reliable evaluation of AV speech enhancement systems in everyday noisy settings. In addition, we present a baseline deep neural network (DNN) based spectral mask estimation model for speech enhancement. Comparative simulation results with subjective listening tests demonstrate significant performance improvement of the baseline DNN compared to state-of-the-art speech enhancement approaches.

Citation

Gogate, M., Dashtipour, K., & Hussain, A. (2020, October). Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. Presented at Interspeech 2020, Shanghai, China

Presentation Conference Type	Conference Paper (published)
Conference Name	Interspeech 2020
Start Date	Oct 25, 2020
End Date	Oct 29, 2020
Online Publication Date	Oct 25, 2020
Publication Date	2020
Deposit Date	Apr 26, 2022
Pages	4521-4525
Book Title	Proc. Interspeech 2020
DOI	https://doi.org/10.21437/interspeech.2020-2935
Public URL	http://researchrepository.napier.ac.uk/Output/2867010
Publisher URL	http://www.interspeech2020.org/uploadfile/pdf/Thu-2-11-6.pdf