Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Prof Amir Hussain A.Hussain@napier.ac.uk / hussain.doctor@gmail.com
Professor
This short paper demonstrates a first of its kind audio-visual (AV) speech enhancement (SE) desktop application that isolates, in real-time, the voice of a target speaker from noisy audio input. The deep neural network model integrated in this application exploits the AV nature of speech from the target speaker to suppress all speech and non-speech background sounds. In the context of a growing need for video conferencing solutions, AV SE enables the practical deployment such technology in challenging acoustic environments with multiple competing background noise sources. In these scenarios, classical audio-only SE typically fails as they are usually trained to isolate speech from non-speech noises. The application comprises a graphical user interface and modules for real-time AV speech acquisition, preprocessing, and enhancement. The participants will experience a significant improvement in the speech quality and intelligibility of a target speaker who will be physically situated in a real noisy environment with a range of real-world noises. Moreover, participants can evaluate the performance of the application with their own voice by recording videos in challenging multi-talker conversational environments.
Gogate, M., Dashtipour, K., & Hussain, A. (2023, August). Application for Real-time Audio-Visual Speech Enhancement. Presented at Interspeech 2023, Dublin, Ireland
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Interspeech 2023 |
Start Date | Aug 20, 2023 |
End Date | Aug 24, 2023 |
Publication Date | 2023 |
Deposit Date | May 21, 2024 |
Peer Reviewed | Peer Reviewed |
Pages | 2026-2027 |
Series ISSN | 2308-457X |
Book Title | Proc. INTERSPEECH 2023 |
Public URL | http://researchrepository.napier.ac.uk/Output/3609220 |
Publisher URL | https://www.isca-archive.org/interspeech_2023/gogate23_interspeech.html |
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
(2023)
Journal Article
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search