Skip to main content

Research Repository

Advanced Search

Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Authors



Abstract

Classical audio-visual (AV) speech enhancement (SE) and separation methods have been successful at operating under constrained environments; however, the speech quality and intelligibility improvement is significantly reduced in unconstrained real-world environments where variation in pose and illumination are encountered. In this paper, we present a novel privacy-preserving approach for real world unconstrained pose-invariant AV SE and separation that contextually exploits pose-invariant 3D landmark flow features and noisy speech features to selectively suppress unwanted background speech and non-speech noises. In addition, we present a unified architecture that integrates state-of-the-art transformers with temporal convolution neural networks for effective pose-invariant AV SE. The preliminary systematic experimentation on benchmark multi-pose OuluVS2 and LRS3-TED corpora demonstrate that the privacy preserving 3D landmark flow features are effective for pose-invariant SE and separation. In addition, the proposed AV SE model significantly outperforms state-of-the-art audio-only SE model, oracle ideal binary mask, and A-only variant of the proposed model in speaker and noise independent settings.

Presentation Conference Type Conference Paper (Published)
Conference Name 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
Start Date Jun 4, 2023
End Date Jun 10, 2023
Online Publication Date Aug 2, 2023
Publication Date 2023
Deposit Date Apr 19, 2024
Publicly Available Date Apr 22, 2024
Publisher Institute of Electrical and Electronics Engineers
Book Title 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
ISBN 9798350302622
DOI https://doi.org/10.1109/icasspw59220.2023.10192961
Keywords Audio-visual speech enhancement, poseinvariant, multimodal hearing aids
Public URL http://researchrepository.napier.ac.uk/Output/3597086

Files

Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids (accepted version) (3.4 Mb)
PDF




You might also like



Downloadable Citations