Song Chen
Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modelling and User Preference Learning
Chen, Song; Kirton-Wingate, Jasper; Doctor, Faiyaz; Arshad, Usama; Dashtipour, Kia; Gogate, Mandar; Halim, Zahid; Al-Dubai, Ahmed; Arslan, Tughrul; Hussain, Amir
Authors
Jasper Kirton-Wingate J.Kirton-wingate@napier.ac.uk
Student Experience
Faiyaz Doctor
Usama Arshad
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Zahid Halim
Prof Ahmed Al-Dubai A.Al-Dubai@napier.ac.uk
Professor
Tughrul Arslan
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Abstract
It is estimated that by 2050 approximately one in ten individuals globally will experience disabling hearing impairment. In the presence of everyday reverberant noise, a substantial proportion of individual users encounter challenges in speech comprehension. This study introduces a novel application of neuro-fuzzy modelling that synergizes and fuses audio-visual speech enhancement (AV SE) with an initial user preference learning based framework. Specifically, our approach uniquely integrates multimodal AV speech data with innovative SE methods and fuzzy inferencing techniques. This integration is further enriched by incorporating a user-preference learning model that adapts to environmental and user-specific contexts, including signal-to-noise ratios, sound power, and the quality of visual information. The proposed framework facilitates the incorporation of clinical measures such as user cognitive load (or listening effort) with real-world uncertainty to steer the system outputs. We employ an adaptive fuzzy neural network to derive the most effective Sugeno fuzzy inference model, employing particle swarm optimization to ensure optimal SE by considering sound power, ambient noise levels, and visual quality. Experimental results utilise our new benchmark AV multi-talker Challenge dataset to demonstrate the superiority of our user preference-informed, context-aware AV SE approach in enhancing speech intelligibility and quality in challenging noisy conditions, marking a significant advancement over conventional methods while reducing energy consumption. The conclusion supports the ecological scalability of our approach and its potential for real-world applications, setting a new benchmark in AV SE research, paving the way for future assistive hearing and communication technologies.
Citation
Chen, S., Kirton-Wingate, J., Doctor, F., Arshad, U., Dashtipour, K., Gogate, M., Halim, Z., Al-Dubai, A., Arslan, T., & Hussain, A. (2024). Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modelling and User Preference Learning. IEEE Transactions on Fuzzy Systems, 32(10), 5400-5412. https://doi.org/10.1109/tfuzz.2024.3435050
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 15, 2024 |
Online Publication Date | Aug 30, 2024 |
Publication Date | 2024-10 |
Deposit Date | Oct 3, 2024 |
Publicly Available Date | Oct 3, 2024 |
Journal | IEEE Transactions on Fuzzy Systems |
Print ISSN | 1063-6706 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 32 |
Issue | 10 |
Pages | 5400-5412 |
DOI | https://doi.org/10.1109/tfuzz.2024.3435050 |
Files
Context-Aware Audio-Visual Speech Enhancement Based On Neuro-Fuzzy Modelling And User Preference Learning (accepted version)
(971 Kb)
PDF
You might also like
Towards individualised speech enhancement: An SNR preference learning system for multi-modal hearing aids
(2023)
Presentation / Conference Contribution
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
(2023)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search