Deep Neural Network Driven Binaural Audio Visual Speech Separation

Gogate, Mandar; Dashtipour, Kia; Bell, Peter; Hussain, Amir

doi:10.1109/ijcnn48605.2020.9207517

Deep Neural Network Driven Binaural Audio Visual Speech Separation

Gogate, Mandar; Dashtipour, Kia; Bell, Peter; Hussain, Amir

Authors

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Peter Bell

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

The central auditory pathway exploits the auditory signals and visual information sent by both ears and eyes to segregate speech from multiple competing noise sources and help disambiguate phonological ambiguity. In this study, inspired from this unique human ability, we present a deep neural network (DNN) that ingest the binaural sounds received at the two ears as well as the visual frames to selectively suppress the competing noise sources individually at both ears. The model exploits the noisy binaural cues and noise robust visual cues to improve speech intelligibility. The comparative simulation results in terms of objective metrics such as PESQ, STOI, SI-SDR and DBSTOI demonstrate significant performance improvement of the proposed audio-visual (AV) DNN as compared to the audio-only (A-only) variant of the proposed model. Finally, subjective listening tests with the real noisy AV ASPIRE corpus shows the superiority of the proposed AV DNN as compared to state-of-the-art approaches.

Citation

Gogate, M., Dashtipour, K., Bell, P., & Hussain, A. (2020, July). Deep Neural Network Driven Binaural Audio Visual Speech Separation. Presented at 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow

Presentation Conference Type	Conference Paper (published)
Conference Name	2020 International Joint Conference on Neural Networks (IJCNN)
Start Date	Jul 19, 2020
End Date	Jul 24, 2020
Online Publication Date	Sep 28, 2020
Publication Date	2020
Deposit Date	Apr 15, 2021
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Pages	1-7
Series ISSN	2161-4407
Book Title	2020 International Joint Conference on Neural Networks (IJCNN)
ISBN	9781728169262
DOI	https://doi.org/10.1109/ijcnn48605.2020.9207517
Public URL	http://researchrepository.napier.ac.uk/Output/2761846

Impact of the Covid-19 pandemic on audiology service delivery: Observational study of the role of social media in patient communication (2024)
Journal Article

Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN (2024)
Journal Article

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning (2023)
Journal Article

Interactive Effect of Learning Rate and Batch Size to Implement Transfer Learning for Brain Tumor Classification (2023)
Journal Article

A Novel Hierarchical Extreme Machine-Learning-Based Approach for Linear Attenuation Coefficient Forecasting (2023)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations