Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System.
Presentation / Conference Contribution
Gogate, M., Dashtipour, K., & Hussain, A. (2020, October). Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. Presented at Interspeech 2020, Shanghai, China
In this paper, we present VIsual Speech In real nOisy eNvironments (VISION), a first of its kind audio-visual (AV) corpus comprising 2500 utterances from 209 speakers, recorded in real noisy environments including social gatherings, streets, cafeteri... Read More about Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System..