Application for Real-time Audio-Visual Speech Enhancement

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Application for Real-time Audio-Visual Speech Enhancement

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Authors

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Prof Amir Hussain A.Hussain@napier.ac.uk / hussain.doctor@gmail.com
Professor

Abstract

This short paper demonstrates a first of its kind audio-visual (AV) speech enhancement (SE) desktop application that isolates, in real-time, the voice of a target speaker from noisy audio input. The deep neural network model integrated in this application exploits the AV nature of speech from the target speaker to suppress all speech and non-speech background sounds. In the context of a growing need for video conferencing solutions, AV SE enables the practical deployment such technology in challenging acoustic environments with multiple competing background noise sources. In these scenarios, classical audio-only SE typically fails as they are usually trained to isolate speech from non-speech noises. The application comprises a graphical user interface and modules for real-time AV speech acquisition, preprocessing, and enhancement. The participants will experience a significant improvement in the speech quality and intelligibility of a target speaker who will be physically situated in a real noisy environment with a range of real-world noises. Moreover, participants can evaluate the performance of the application with their own voice by recording videos in challenging multi-talker conversational environments.

Citation

Gogate, M., Dashtipour, K., & Hussain, A. (2023, August). Application for Real-time Audio-Visual Speech Enhancement. Presented at Interspeech 2023, Dublin, Ireland

Presentation Conference Type	Conference Paper (published)
Conference Name	Interspeech 2023
Start Date	Aug 20, 2023
End Date	Aug 24, 2023
Publication Date	2023
Deposit Date	May 21, 2024
Peer Reviewed	Peer Reviewed
Pages	2026-2027
Series ISSN	2308-457X
Book Title	Proc. INTERSPEECH 2023
Public URL	http://researchrepository.napier.ac.uk/Output/3609220
Publisher URL	https://www.isca-archive.org/interspeech_2023/gogate23_interspeech.html

Impact of the Covid-19 pandemic on audiology service delivery: Observational study of the role of social media in patient communication (2024)
Journal Article

Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN (2024)
Journal Article

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning (2023)
Journal Article

Interactive Effect of Learning Rate and Batch Size to Implement Transfer Learning for Brain Tumor Classification (2023)
Journal Article

A Novel Hierarchical Extreme Machine-Learning-Based Approach for Linear Attenuation Coefficient Forecasting (2023)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations