DNN driven speaker independent audio-visual mask estimation for speech separation

Gogate, Mandar; Adeel, Ahsan; Marxer, Ricard; Barker, Jon; Hussain, Amir

doi:10.21437/Interspeech.2018-2516

Deep Learning-Based Receiver Design for IoT Multi-User Uplink 5G-NR System (2024)
Conference Proceeding
Gupta, A., Bishnu, A., Ratnarajah, T., Adeel, A., Hussain, A., & Sellathurai, M. (2024). Deep Learning-Based Receiver Design for IoT Multi-User Uplink 5G-NR System. In GLOBECOM 2023 - 2023 IEEE Global Communications Conference (4110-4115). https://doi.org/10.1109/globecom54140.2023.10437776

Designing an efficient receiver for multiple users transmitting orthogonal frequency-division multiplexing signals to the base station remain a challenging interference-limited problem in 5G-new radio (5G-NR) system. This can lead to stagnation of de... Read More about Deep Learning-Based Receiver Design for IoT Multi-User Uplink 5G-NR System.

Resolving the Decreased Rank Attack in RPL’s IoT Networks (2023)
Conference Proceeding
Ghaleb, B., Al-Duba, A., Hussain, A., Romdhani, I., & Jaroucheh, Z. (2023). Resolving the Decreased Rank Attack in RPL’s IoT Networks. In 2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT) (65-68). https://doi.org/10.1109/DCOSS-IoT58021.2023.00018

The Routing Protocol for Low power and Lossy networks (RPL) has been developed by the Internet Engineering Task Force (IETF) standardization body to serve as a part of the 6LoWPAN (IPv6 over Low-Power Wireless Personal Area Networks) standard, a core... Read More about Resolving the Decreased Rank Attack in RPL’s IoT Networks.

Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids (2023)
Conference Proceeding
Gogate, M., Dashtipour, K., & Hussain, A. (2023). Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). https://doi.org/10.1109/icasspw59220.2023.10192961

Classical audio-visual (AV) speech enhancement (SE) and separation methods have been successful at operating under constrained environments; however, the speech quality and intelligibility improvement is significantly reduced in unconstrained real-wo... Read More about Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids.

Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids (2023)
Conference Proceeding
Kirton-Wingate, J., Ahmed, S., Gogate, M., Tsao, Y., & Hussain, A. (2023). Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids. In K. Dashtipour (Ed.), Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). https://doi.org/10.1109/icasspw59220.2023.10193122

Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear am... Read More about Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids.

Live Demonstration: Cloud-based Audio-Visual Speech Enhancement in Multimodal Hearing-aids (2023)
Conference Proceeding
Bishnu, A., Gupta, A., Gogate, M., Dashtipour, K., Arslan, T., Adeel, A., …Ratnarajah, T. (2023). Live Demonstration: Cloud-based Audio-Visual Speech Enhancement in Multimodal Hearing-aids. In IEEE ISCAS 2023 Symposium Proceedings. https://doi.org/10.1109/iscas46773.2023.10182060

Hearing loss is among the most serious public health problems, affecting as much as 20% of the worldwide population. Even cutting-edge multi-channel audio-only speech enhancement (SE) algorithms used in modern hearing aids face significant hurdles si... Read More about Live Demonstration: Cloud-based Audio-Visual Speech Enhancement in Multimodal Hearing-aids.

Live Demonstration: Real-time Multi-modal Hearing Assistive Technology Prototype (2023)
Conference Proceeding
Gogate, M., Hussain, A., Dashtipour, K., & Hussain, A. (2023). Live Demonstration: Real-time Multi-modal Hearing Assistive Technology Prototype. In IEEE ISCAS 2023 Symposium Proceedings. https://doi.org/10.1109/iscas46773.2023.10182070

Hearing loss affects at least 1.5 billion people globally. The WHO estimates 83% of people who could benefit from hearing aids do not use them. Barriers to HA uptake are multifaceted but include ineffectiveness of current HA technology in noisy envir... Read More about Live Demonstration: Real-time Multi-modal Hearing Assistive Technology Prototype.

AVSE Challenge: Audio-Visual Speech Enhancement Challenge (2023)
Conference Proceeding
Aldana Blanco, A. L., Valentini-Botinhao, C., Klejch, O., Gogate, M., Dashtipour, K., Hussain, A., & Bell, P. (2023). AVSE Challenge: Audio-Visual Speech Enhancement Challenge. In 2022 IEEE Spoken Language Technology Workshop (SLT) (465-471). https://doi.org/10.1109/slt54892.2023.10023284

Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too chal... Read More about AVSE Challenge: Audio-Visual Speech Enhancement Challenge.

A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids (2022)
Conference Proceeding
Bishnu, A., Gupta, A., Gogate, M., Dashtipour, K., Adeel, A., Hussain, A., …Ratnarajah, T. (2022). A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids. In 2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom). https://doi.org/10.1109/healthcom54947.2022.9982772

In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The... Read More about A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids.

A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning (2022)
Conference Proceeding
Hussain, T., Diyan, M., Gogate, M., Dashtipour, K., Adeel, A., Tsao, Y., & Hussain, A. (2022). A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). https://doi.org/10.1109/embc48229.2022.9871113

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals. Despite improving the speech quality, su... Read More about A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning.

An Attribute Weight Estimation Using Particle Swarm Optimization and Machine Learning Approaches for Customer Churn Prediction (2021)
Conference Proceeding
Kanwal, S., Rashid, J., Kim, J., Nisar, M. W., Hussain, A., Batool, S., & Kanwal, R. (2021). An Attribute Weight Estimation Using Particle Swarm Optimization and Machine Learning Approaches for Customer Churn Prediction. In 2021 International Conference on Innovative Computing (ICIC) (745-750). https://doi.org/10.1109/icic53490.2021.9693040

One of the most challenging problems in the telecommunications industry is predicting customer churn (CCP). Decision-makers and business experts stressed that acquiring new clients is more expensive than maintaining current ones. From current churn d... Read More about An Attribute Weight Estimation Using Particle Swarm Optimization and Machine Learning Approaches for Customer Churn Prediction.

Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System (2020)
Conference Proceeding
Gogate, M., Dashtipour, K., & Hussain, A. (2020). Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. In Proc. Interspeech 2020 (4521-4525). https://doi.org/10.21437/interspeech.2020-2935

In this paper, we present VIsual Speech In real nOisy eNvironments (VISION), a first of its kind audio-visual (AV) corpus comprising 2500 utterances from 209 speakers, recorded in real noisy environments including social gatherings, streets, cafeteri... Read More about Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System.

Deep Neural Network Driven Binaural Audio Visual Speech Separation (2020)
Conference Proceeding
Gogate, M., Dashtipour, K., Bell, P., & Hussain, A. (2020). Deep Neural Network Driven Binaural Audio Visual Speech Separation. In 2020 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn48605.2020.9207517

The central auditory pathway exploits the auditory signals and visual information sent by both ears and eyes to segregate speech from multiple competing noise sources and help disambiguate phonological ambiguity. In this study, inspired from this uni... Read More about Deep Neural Network Driven Binaural Audio Visual Speech Separation.

Advances in Brain Inspired Cognitive Systems: 10th International Conference, BICS 2019, Guangzhou, China, July 13–14, 2019, Proceedings (2020)
Conference Proceeding
(2020). Advances in Brain Inspired Cognitive Systems: 10th International Conference, BICS 2019, Guangzhou, China, July 13–14, 2019, Proceedings. In J. Ren, A. Hussain, H. Zhao, K. Huang, J. Zheng, J. Cai, …Y. Xiao (Eds.), Advances in Brain Inspired Cognitive Systems. https://doi.org/10.1007/978-3-030-39431-8

This book constitutes the refereed proceedings of the 10th International Conference on Advances in Brain Inspired Cognitive Systems, BICS 2019, held in Guangzhou, China, in July 2019. The 57 papers presented in this volume were carefully reviewed... Read More about Advances in Brain Inspired Cognitive Systems: 10th International Conference, BICS 2019, Guangzhou, China, July 13–14, 2019, Proceedings.

Self-focus Deep Embedding Model for Coarse-Grained Zero-Shot Classification (2020)
Conference Proceeding
Yang, G., Huang, K., Zhang, R., Goulermas, J. Y., & Hussain, A. (2020). Self-focus Deep Embedding Model for Coarse-Grained Zero-Shot Classification. In Advances in Brain Inspired Cognitive Systems. BICS 2019 (12-22). https://doi.org/10.1007/978-3-030-39431-8_2

Zero-shot learning (ZSL), i.e. classifying patterns where there is a lack of labeled training data, is a challenging yet important research topic. One of the most common ideas for ZSL is to map the data (e.g., images) and semantic attributes to the s... Read More about Self-focus Deep Embedding Model for Coarse-Grained Zero-Shot Classification.

Height Prediction for Growth Hormone Deficiency Treatment Planning Using Deep Learning (2020)
Conference Proceeding
Ilyas, M., Ahmad, J., Lawson, A., Khan, J. S., Tahir, A., Adeel, A., …Hussain, A. (2020). Height Prediction for Growth Hormone Deficiency Treatment Planning Using Deep Learning. In Advances in Brain Inspired Cognitive Systems (76-85). https://doi.org/10.1007/978-3-030-39431-8_8

Prospective studies using longitudinal patient data can be used to help to predict responsiveness to Growth Hormone (GH) therapy and assess any suspected risks. In this paper, a novel Clinical Decision Support System (CDSS) is developed to predict gr... Read More about Height Prediction for Growth Hormone Deficiency Treatment Planning Using Deep Learning.

Offline Arabic Handwriting Recognition Using Deep Machine Learning: A Review of Recent Advances (2020)
Conference Proceeding
Ahmed, R., Dashtipour, K., Gogate, M., Raza, A., Zhang, R., Huang, K., …Hussain, A. (2020). Offline Arabic Handwriting Recognition Using Deep Machine Learning: A Review of Recent Advances. In Advances in Brain Inspired Cognitive Systems: 10th International Conference, BICS 2019, Guangzhou, China, July 13–14, 2019, Proceedings (457-468). https://doi.org/10.1007/978-3-030-39431-8_44

In pattern recognition, automatic handwriting recognition (AHWR) is an area of research that has developed rapidly in the last few years. It can play a significant role in broad-spectrum of applications rending from, bank cheque processing, applicati... Read More about Offline Arabic Handwriting Recognition Using Deep Machine Learning: A Review of Recent Advances.

Generalized Adversarial Training in Riemannian Space (2020)
Conference Proceeding
Zhang, S., Huang, K., Zhang, R., & Hussain, A. (2020). Generalized Adversarial Training in Riemannian Space. In 2019 IEEE International Conference on Data Mining (ICDM) (826-835). https://doi.org/10.1109/icdm.2019.00093

Adversarial examples, referred to as augmented data points generated by imperceptible perturbations of input samples, have recently drawn much attention. Well-crafted adversarial examples may even mislead state-of-the-art deep neural network (DNN) mo... Read More about Generalized Adversarial Training in Riemannian Space.

Random Features and Random Neurons for Brain-Inspired Big Data Analytics (2020)
Conference Proceeding
Gogate, M., Hussain, A., & Huang, K. (2020). Random Features and Random Neurons for Brain-Inspired Big Data Analytics. In 2019 International Conference on Data Mining Workshops (ICDMW). https://doi.org/10.1109/icdmw.2019.00080

With the explosion of Big Data, fast and frugal reasoning algorithms are increasingly needed to keep up with the size and the pace of user-generated contents on the Web. In many real-time applications, it is preferable to be able to process more data... Read More about Random Features and Random Neurons for Brain-Inspired Big Data Analytics.

Preface (2018)
Conference Proceeding
Ren, J., Hussain, A., Zheng, J., Liu, C., Luo, B., Zhao, H., & Zhao, X. (2018). Preface. In Advances in Brain Inspired Cognitive Systems (V-VI). https://doi.org/10.1007/978-3-030-00563-4

DNN driven speaker independent audio-visual mask estimation for speech separation (2018)
Conference Proceeding
Gogate, M., Adeel, A., Marxer, R., Barker, J., & Hussain, A. (2018). DNN driven speaker independent audio-visual mask estimation for speech separation. In Proceedings of the Annual Conference of the International Speech Communication Association (2723-2727). https://doi.org/10.21437/Interspeech.2018-2516

All Outputs (164)