Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

doi:10.1109/icasspw59220.2023.10192961

Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Authors

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Senior Research Fellow

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

Classical audio-visual (AV) speech enhancement (SE) and separation methods have been successful at operating under constrained environments; however, the speech quality and intelligibility improvement is significantly reduced in unconstrained real-world environments where variation in pose and illumination are encountered. In this paper, we present a novel privacy-preserving approach for real world unconstrained pose-invariant AV SE and separation that contextually exploits pose-invariant 3D landmark flow features and noisy speech features to selectively suppress unwanted background speech and non-speech noises. In addition, we present a unified architecture that integrates state-of-the-art transformers with temporal convolution neural networks for effective pose-invariant AV SE. The preliminary systematic experimentation on benchmark multi-pose OuluVS2 and LRS3-TED corpora demonstrate that the privacy preserving 3D landmark flow features are effective for pose-invariant SE and separation. In addition, the proposed AV SE model significantly outperforms state-of-the-art audio-only SE model, oracle ideal binary mask, and A-only variant of the proposed model in speaker and noise independent settings.

Presentation Conference Type	Conference Paper (Published)
Conference Name	2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
Start Date	Jun 4, 2023
End Date	Jun 10, 2023
Online Publication Date	Aug 2, 2023
Publication Date	2023
Deposit Date	Apr 19, 2024
Publicly Available Date	Apr 22, 2024
Publisher	Institute of Electrical and Electronics Engineers
Book Title	2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
ISBN	9798350302622
DOI	https://doi.org/10.1109/icasspw59220.2023.10192961
Keywords	Audio-visual speech enhancement, poseinvariant, multimodal hearing aids
Public URL	http://researchrepository.napier.ac.uk/Output/3597086