I-Chun Chern
Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings
Chern, I-Chun; Hung, Kuo-Hsuan; Chen, Yi-Ting; Hussain, Tassadaq; Gogate, Mandar; Hussain, Amir; Tsao, Yu; Hou, Jen-Cheng
Authors
Kuo-Hsuan Hung
Yi-Ting Chen
Tassadaq Hussain
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Yu Tsao
Jen-Cheng Hou
Abstract
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks.
Citation
Chern, I.-C., Hung, K.-H., Chen, Y.-T., Hussain, T., Gogate, M., Hussain, A., Tsao, Y., & Hou, J.-C. (2023, June). Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings. Presented at 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, Greece
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) |
Start Date | Jun 4, 2023 |
End Date | Jun 10, 2023 |
Online Publication Date | Aug 2, 2023 |
Publication Date | 2023 |
Deposit Date | May 21, 2024 |
Peer Reviewed | Peer Reviewed |
Book Title | 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) |
DOI | https://doi.org/10.1109/ICASSPW59220.2023.10193049 |
You might also like
Statistical Downscaling Modeling for Temperature Prediction
(2024)
Book Chapter
Federated Learning for Market Surveillance
(2024)
Book Chapter
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search