Dr. Mandar Gogate M.Gogate@napier.ac.uk
Senior Research Fellow
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Senior Research Fellow
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
In this paper, we present VIsual Speech In real nOisy eNvironments (VISION), a first of its kind audio-visual (AV) corpus comprising 2500 utterances from 209 speakers, recorded in real noisy environments including social gatherings, streets, cafeterias and restaurants. While a number of speech enhancement frameworks have been proposed in the literature that exploit AV cues, there are no visual speech corpora recorded in real environments with a sufficient variety of speakers, to enable evaluation of AV frameworks' generalisation capability in a wide range of background visual and acoustic noises. The main purpose of our AV corpus is to foster research in the area of AV signal processing and to provide a benchmark corpus that can be used for reliable evaluation of AV speech enhancement systems in everyday noisy settings. In addition, we present a baseline deep neural network (DNN) based spectral mask estimation model for speech enhancement. Comparative simulation results with subjective listening tests demonstrate significant performance improvement of the baseline DNN compared to state-of-the-art speech enhancement approaches.
Gogate, M., Dashtipour, K., & Hussain, A. (2020). Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. In Proc. Interspeech 2020 (4521-4525). https://doi.org/10.21437/interspeech.2020-2935
Conference Name | Interspeech 2020 |
---|---|
Conference Location | Shanghai, China |
Start Date | Oct 25, 2020 |
End Date | Oct 29, 2020 |
Online Publication Date | Oct 25, 2020 |
Publication Date | 2020 |
Deposit Date | Apr 26, 2022 |
Pages | 4521-4525 |
Book Title | Proc. Interspeech 2020 |
DOI | https://doi.org/10.21437/interspeech.2020-2935 |
Public URL | http://researchrepository.napier.ac.uk/Output/2867010 |
Publisher URL | http://www.interspeech2020.org/uploadfile/pdf/Thu-2-11-6.pdf |
Arabic sentiment analysis using dependency-based rules and deep neural networks
(2022)
Journal Article
Detecting Alzheimer’s Disease Using Machine Learning Methods
(2022)
Conference Proceeding
A Generative Learning Approach to Sensor Fusion and Change Detection
(2016)
Journal Article
A Survey on the Role of Wireless Sensor Networks and IoT in Disaster Management
(2018)
Book Chapter
Offline Arabic Handwriting Recognition Using Deep Machine Learning: A Review of Recent Advances
(2020)
Conference Proceeding
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Advanced Search