Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Peter Bell
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
The central auditory pathway exploits the auditory signals and visual information sent by both ears and eyes to segregate speech from multiple competing noise sources and help disambiguate phonological ambiguity. In this study, inspired from this unique human ability, we present a deep neural network (DNN) that ingest the binaural sounds received at the two ears as well as the visual frames to selectively suppress the competing noise sources individually at both ears. The model exploits the noisy binaural cues and noise robust visual cues to improve speech intelligibility. The comparative simulation results in terms of objective metrics such as PESQ, STOI, SI-SDR and DBSTOI demonstrate significant performance improvement of the proposed audio-visual (AV) DNN as compared to the audio-only (A-only) variant of the proposed model. Finally, subjective listening tests with the real noisy AV ASPIRE corpus shows the superiority of the proposed AV DNN as compared to state-of-the-art approaches.
Gogate, M., Dashtipour, K., Bell, P., & Hussain, A. (2020, July). Deep Neural Network Driven Binaural Audio Visual Speech Separation. Presented at 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 2020 International Joint Conference on Neural Networks (IJCNN) |
Start Date | Jul 19, 2020 |
End Date | Jul 24, 2020 |
Online Publication Date | Sep 28, 2020 |
Publication Date | 2020 |
Deposit Date | Apr 15, 2021 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Pages | 1-7 |
Series ISSN | 2161-4407 |
Book Title | 2020 International Joint Conference on Neural Networks (IJCNN) |
ISBN | 9781728169262 |
DOI | https://doi.org/10.1109/ijcnn48605.2020.9207517 |
Public URL | http://researchrepository.napier.ac.uk/Output/2761846 |
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
(2023)
Journal Article
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search