Towards real-time privacy-preserving audio-visual speech enhancement

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

doi:10.21437/spsc.2022-2

Towards real-time privacy-preserving audio-visual speech enhancement

Gogate, Mandar; Dashtipour, Kia; Hussain, Amir

Authors

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

Human auditory cortex in everyday noisy situations is known to exploit aural and visual cues that are contextually combined by the brain’s multi-level integration strategies to selectively suppress the background noise and focus on the target speaker. The multimodal nature of speech is well established, with listeners known to unconsciously lip read to improve the intelligibility of speech in noise. However, despite significant research in the area of audio-visual (AV) speech enhancement real-time processing models, with low latency remains a formidable technical challenge. In this paper, we propose a novel audio-visual speech enhancement model based on Temporal Convolutional Networks (TCN) that exploit the privacy preserving lip-landmark flow features for speech enhancement in multitalker cocktail party environments. In addition, we propose an efficient implementation of TCN, called Fast-TCN, to enable real time deployment of the proposed framework. The comparative simulation results in terms of speech quality and intelligibility demonstrate the effectiveness of our proposed AV model as compared to benchmark audio-only and audio-visual approaches for speaker and noise independent scenarios.

Citation

Gogate, M., Dashtipour, K., & Hussain, A. (2022, September). Towards real-time privacy-preserving audio-visual speech enhancement. Presented at 2nd Symposium on Security and Privacy in Speech Communication, Incheon, Korea

Presentation Conference Type	Conference Paper (published)
Conference Name	2nd Symposium on Security and Privacy in Speech Communication
Start Date	Sep 23, 2022
End Date	Sep 24, 2022
Online Publication Date	Sep 23, 2022
Publication Date	2022
Deposit Date	Dec 18, 2022
Peer Reviewed	Peer Reviewed
Pages	7-10
Book Title	Proc. 2nd Symposium on Security and Privacy in Speech Communication
DOI	https://doi.org/10.21437/spsc.2022-2
Keywords	speech enhancement, audio-visual speech separation, privacy-preserving
Public URL	http://researchrepository.napier.ac.uk/Output/2986305

Impact of the Covid-19 pandemic on audiology service delivery: Observational study of the role of social media in patient communication (2024)
Journal Article

Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN (2024)
Journal Article

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning (2023)
Journal Article

A Novel Hierarchical Extreme Machine-Learning-Based Approach for Linear Attenuation Coefficient Forecasting (2023)
Journal Article

Arabic sentiment analysis using dependency-based rules and deep neural networks (2022)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations