Towards multilingual audio-visual speech enhancement in real noisy environments

Overview

People Involved

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Senior Research Fellow

Project Description

Speech enhancement aims to improve the overall quality and intelligibility of speech degraded by noise sources in real-world noisy environments. In recent years, researchers have proposed audio-visual speech enhancement models that go beyond traditional audio-only processing to provide better noise suppression and speech restoration in low SNR environments where multiple competing background noise sources are present. However, the audio-visual speech enhancement methods are language dependent as they exploit the correlations between visemes and the uttered speech. In addition, it has been shown that speaker pose variation significantly degrades the performance of these models.
This project aims to address the aforementioned two critical challenges with the current audio-visual speech enhancement models. The following research objectives will contribute to this development.

1. To design a novel multilingual audio-visual (AV) speech enhancement framework exploiting advanced machine learning techniques to address
2. To develop a novel multiview AV speech enhancement framework exploiting image translation and pose-invariant features.
3. Finally, we will integrate the two frameworks and critically evaluate the robustness and generalisation of the framework in a range of real world environments (e.g. cafeteria and restaurant) and use cases (e.g. car).

Project Acronym	Audio-visual speech enhancement
Status	Project Live
Funder(s)	Royal Society
Value	£12,000.00
Project Dates	Feb 17, 2023 - Feb 16, 2025

COG-MHEAR: Towards cognitively-inspired, 5G-IoT enabled multi-modal hearing aids Mar 1, 2021 - Feb 28, 2026
Embracing the multimodal nature of speech presents both opportunities and challenges for hearing assistive technology:
on the one hand there are opportunities for the design of new multimodal audio-visual (AV) algorithms; on the other hand,Read More about COG-MHEAR: Towards cognitively-inspired, 5G-IoT enabled multi-modal hearing aids.

Artificial Intelligence (AI)-powered dashboard for COVID-19 related public sentiment and opinion mining in social media platforms May 1, 2020 - Nov 17, 2020
: The project will aid in understanding and mitigating the direct and indirect impacts of the COVID-19 pandemic, by creating an AI-driven dashboard for policymakers, public health and clinical practitioners. This will enable continuous monitoring, pr... Read More about Artificial Intelligence (AI)-powered dashboard for COVID-19 related public sentiment and opinion mining in social media platforms.

KTP Ace Aquatec Jun 1, 2021 - Nov 30, 2023
To develop, test and incorporate Innovative AI based approaches to improve accuracy of Individual fish identification and sea lice detection.

Security and Privacy in Vehicular Ad-hoc Networks Mar 1, 2022 - Sep 30, 2024
The great leap forward in wireless communications technology drives the recent advancements of Vehicular Ad hoc NETworks (VANETs). As a key part of the Intelligent Transportation Systems (ITS) framework, VANETs offer active road safety, and traffic e... Read More about Security and Privacy in Vehicular Ad-hoc Networks.

Cross-lingual Audio-visual Speech Enhancement based on Deep Multimodal Learning Jun 1, 2023 - May 31, 2025
Speech enhancement and separation techniques are often used to improve the quality and intelligibility of speech degraded by background distractions, including speech and non-speech noises. We aim to change the current landscape of research and innov... Read More about Cross-lingual Audio-visual Speech Enhancement based on Deep Multimodal Learning.

Towards multilingual audio-visual speech enhancement in real noisy environments

People Involved

Project Description

You might also like