Dr. Mandar Gogate M.Gogate@napier.ac.uk
Senior Research Fellow
Towards real-time privacy-preserving audio-visual speech enhancement
Gogate, Mandar; Dashtipour, Kia; Hussain, Amir
Authors
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Abstract
Human auditory cortex in everyday noisy situations is known to exploit aural and visual cues that are contextually combined by the brain’s multi-level integration strategies to selectively suppress the background noise and focus on the target speaker. The multimodal nature of speech is well established, with listeners known to unconsciously lip read to improve the intelligibility of speech in noise. However, despite significant research in the area of audio-visual (AV) speech enhancement real-time processing models, with low latency remains a formidable technical challenge. In this paper, we propose a novel audio-visual speech enhancement model based on Temporal Convolutional Networks (TCN) that exploit the privacy preserving lip-landmark flow features for speech enhancement in multitalker cocktail party environments. In addition, we propose an efficient implementation of TCN, called Fast-TCN, to enable real time deployment of the proposed framework. The comparative simulation results in terms of speech quality and intelligibility demonstrate the effectiveness of our proposed AV model as compared to benchmark audio-only and audio-visual approaches for speaker and noise independent scenarios.
Citation
Gogate, M., Dashtipour, K., & Hussain, A. (2022, September). Towards real-time privacy-preserving audio-visual speech enhancement. Paper presented at 2nd Symposium on Security and Privacy in Speech Communication, Incheon, Korea
Presentation Conference Type | Conference Paper (unpublished) |
---|---|
Conference Name | 2nd Symposium on Security and Privacy in Speech Communication |
Conference Location | Incheon, Korea |
Start Date | Sep 23, 2022 |
End Date | Sep 24, 2022 |
Online Publication Date | Sep 23, 2022 |
Deposit Date | Dec 18, 2022 |
DOI | https://doi.org/10.21437/spsc.2022-2 |
Keywords | speech enhancement, audio-visual speech separation, privacy-preserving |
Public URL | http://researchrepository.napier.ac.uk/Output/2986305 |
You might also like
A hybrid dependency-based approach for Urdu sentiment analysis
(2023)
Journal Article
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
(2023)
Journal Article
Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids
(2023)
Conference Proceeding
The P vs. NP Problem and Attempts to Settle It via Perfect Graphs State-of-the-Art Approach
(2023)
Conference Proceeding
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search