Skip to main content

Research Repository

Advanced Search

Application for Real-time Audio-Visual Speech Enhancement

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Authors



Abstract

This short paper demonstrates a first of its kind audio-visual (AV) speech enhancement (SE) desktop application that isolates, in real-time, the voice of a target speaker from noisy audio input. The deep neural network model integrated in this application exploits the AV nature of speech from the target speaker to suppress all speech and non-speech background sounds. In the context of a growing need for video conferencing solutions, AV SE enables the practical deployment such technology in challenging acoustic environments with multiple competing background noise sources. In these scenarios, classical audio-only SE typically fails as they are usually trained to isolate speech from non-speech noises. The application comprises a graphical user interface and modules for real-time AV speech acquisition, preprocessing, and enhancement. The participants will experience a significant improvement in the speech quality and intelligibility of a target speaker who will be physically situated in a real noisy environment with a range of real-world noises. Moreover, participants can evaluate the performance of the application with their own voice by recording videos in challenging multi-talker conversational environments.

Citation

Gogate, M., Dashtipour, K., & Hussain, A. (2023, August). Application for Real-time Audio-Visual Speech Enhancement. Presented at Interspeech 2023, Dublin, Ireland

Presentation Conference Type Conference Paper (published)
Conference Name Interspeech 2023
Start Date Aug 20, 2023
End Date Aug 24, 2023
Publication Date 2023
Deposit Date May 21, 2024
Peer Reviewed Peer Reviewed
Pages 2026-2027
Series ISSN 2308-457X
Book Title Proc. INTERSPEECH 2023
Public URL http://researchrepository.napier.ac.uk/Output/3609220
Publisher URL https://www.isca-archive.org/interspeech_2023/gogate23_interspeech.html