Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
In this paper, we present VIsual Speech In real nOisy eNvironments (VISION), a first of its kind audio-visual (AV) corpus comprising 2500 utterances from 209 speakers, recorded in real noisy environments including social gatherings, streets, cafeterias and restaurants. While a number of speech enhancement frameworks have been proposed in the literature that exploit AV cues, there are no visual speech corpora recorded in real environments with a sufficient variety of speakers, to enable evaluation of AV frameworks' generalisation capability in a wide range of background visual and acoustic noises. The main purpose of our AV corpus is to foster research in the area of AV signal processing and to provide a benchmark corpus that can be used for reliable evaluation of AV speech enhancement systems in everyday noisy settings. In addition, we present a baseline deep neural network (DNN) based spectral mask estimation model for speech enhancement. Comparative simulation results with subjective listening tests demonstrate significant performance improvement of the baseline DNN compared to state-of-the-art speech enhancement approaches.
Gogate, M., Dashtipour, K., & Hussain, A. (2020, October). Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. Presented at Interspeech 2020, Shanghai, China
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Interspeech 2020 |
Start Date | Oct 25, 2020 |
End Date | Oct 29, 2020 |
Online Publication Date | Oct 25, 2020 |
Publication Date | 2020 |
Deposit Date | Apr 26, 2022 |
Pages | 4521-4525 |
Book Title | Proc. Interspeech 2020 |
DOI | https://doi.org/10.21437/interspeech.2020-2935 |
Public URL | http://researchrepository.napier.ac.uk/Output/2867010 |
Publisher URL | http://www.interspeech2020.org/uploadfile/pdf/Thu-2-11-6.pdf |
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
(2023)
Journal Article
Arabic sentiment analysis using dependency-based rules and deep neural networks
(2022)
Journal Article
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search