Dr. Mandar Gogate M.Gogate@napier.ac.uk
Senior Research Fellow
Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids
Gogate, Mandar; Dashtipour, Kia; Hussain, Amir
Authors
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Abstract
Classical audio-visual (AV) speech enhancement (SE) and separation methods have been successful at operating under constrained environments; however, the speech quality and intelligibility improvement is significantly reduced in unconstrained real-world environments where variation in pose and illumination are encountered. In this paper, we present a novel privacy-preserving approach for real world unconstrained pose-invariant AV SE and separation that contextually exploits pose-invariant 3D landmark flow features and noisy speech features to selectively suppress unwanted background speech and non-speech noises. In addition, we present a unified architecture that integrates state-of-the-art transformers with temporal convolution neural networks for effective pose-invariant AV SE. The preliminary systematic experimentation on benchmark multi-pose OuluVS2 and LRS3-TED corpora demonstrate that the privacy preserving 3D landmark flow features are effective for pose-invariant SE and separation. In addition, the proposed AV SE model significantly outperforms state-of-the-art audio-only SE model, oracle ideal binary mask, and A-only variant of the proposed model in speaker and noise independent settings.
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) |
Start Date | Jun 4, 2023 |
End Date | Jun 10, 2023 |
Online Publication Date | Aug 2, 2023 |
Publication Date | 2023 |
Deposit Date | Apr 19, 2024 |
Publicly Available Date | Apr 22, 2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Book Title | 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) |
ISBN | 9798350302622 |
DOI | https://doi.org/10.1109/icasspw59220.2023.10192961 |
Keywords | Audio-visual speech enhancement, poseinvariant, multimodal hearing aids |
Public URL | http://researchrepository.napier.ac.uk/Output/3597086 |
Files
Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids (accepted version)
(3.4 Mb)
PDF
You might also like
Toward's Arabic multi-modal sentiment analysis
(2018)
Presentation / Conference Contribution
A novel brain-inspired compression-based optimised multimodal fusion for emotion recognition
(2018)
Presentation / Conference Contribution
DNN driven speaker independent audio-visual mask estimation for speech separation
(2018)
Presentation / Conference Contribution
Deep learning driven multimodal fusion for automated deception detection
(2018)
Presentation / Conference Contribution
Exploiting Deep Learning for Persian Sentiment Analysis
(2018)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search