Tassadaq Hussain
Towards intelligibility-oriented audio-visual speech enhancement
Hussain, Tassadaq; Gogate, Mandar; Dashtipour, Kia; Hussain, Amir
Authors
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Abstract
Existing deep learning (DL) based approaches are generally optimised to minimise the distance between clean and enhanced speech features. These often result in improved speech quality however they suffer from a lack of generalisation and may not deliver the required speech intelligibility in real noisy situations. In an attempt to address these challenges, researchers have explored intelligibility-oriented (I-O) loss functions and integration of audio-visual (AV) information for more robust speech enhancement (SE). In this paper, we introduce DL based I-O SE algorithms exploiting AV information, which is a novel and previously unexplored research direction. Specifically, we present a fully convolutional AV SE model that uses a modified short-time objective intelligibility (STOI) metric as a training cost function. To the best of our knowledge, this is the first work that exploits the integration of AV modalities with an I-O based loss function for SE. Comparative experimental results demonstrate that our proposed I-O AV SE framework outperforms audio-only (AO) and AV models trained with conventional distance-based loss functions, in terms of standard objective evaluation measures when dealing with unseen speakers and noises.
Citation
Hussain, T., Gogate, M., Dashtipour, K., & Hussain, A. (2021, September). Towards intelligibility-oriented audio-visual speech enhancement. Presented at The Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2021), Online
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | The Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2021) |
Start Date | Sep 16, 2021 |
End Date | Sep 17, 2021 |
Publication Date | 2021 |
Deposit Date | May 28, 2024 |
Peer Reviewed | Peer Reviewed |
Publisher URL | https://claritychallenge.org/clarity2021-workshop/papers/Clarity_2021_CEC1_paper_final_hussain.pdf |
You might also like
Statistical Downscaling Modeling for Temperature Prediction
(2024)
Book Chapter
Federated Learning for Market Surveillance
(2024)
Book Chapter
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search