Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings
(2023)
Presentation / Conference Contribution
Chern, I., Hung, K., Chen, Y., Hussain, T., Gogate, M., Hussain, A., Tsao, Y., & Hou, J. (2023, June). Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings. Presented at 2023 IEEE International Conference on A
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via u... Read More about Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings.