MA-Net: Resource-efficient multi-attentional network for end-to-end speech enhancement

Wahab, Fazal E; Ye, Zhongfu; Saleem, Nasir; Ullah, Rizwan; Hussain, Amir

doi:10.1016/j.neucom.2024.129150

MA-Net: Resource-efficient multi-attentional network for end-to-end speech enhancement

Wahab, Fazal E; Ye, Zhongfu; Saleem, Nasir; Ullah, Rizwan; Hussain, Amir

Authors

Fazal E Wahab

Zhongfu Ye

Nasir Saleem

Rizwan Ullah

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

Deep Neural Networks (DNNs) have transformed speech enhancement (SE) by solving the complex relationships within speech signals through their multi-layered hierarchical representations. However, their computational demands remain a challenging problem. Self-attention has emerged as a key technique for capturing long-range dependencies in speech signals by measuring attention between vectors through scaled-dot products. Despite its widespread utility across various domains, self-attention encounters limitations when applied to SE. Specifically, its efficiency reduces in low signal-to-noise ratio (SNR) conditions due to its sensitivity to the scale of input vectors influenced by factors such as low SNRs. To address these challenges, we propose a resource-efficient Multi-Attention Network (MA-Net) speech enhancement model to effectively capture local and long-range dependencies in speech signals, while maintaining a low computational footprint. MA-Net integrates two fundamental modules: Spectral Temporal Hybrid Attention (STHA) and Dynamic Feedback Shuffle Attention (DFSA). The STHA module is designed to model long-range dependencies in spectral and temporal features by using hybrid self-attention (HSA). This mechanism computes attention weights between query () and key () vectors using dot-product and cosine similarity scores to mitigate the impact of scale variations in input vectors, enabling more consistent and reliable attention mechanisms. The DFSA module iteratively applies channel and spatial attention to dynamically refine feature representations by adjusting the weight of each iteration’s output based on input spectral features. Evaluations performed on two benchmark datasets (WSJ0-SI84 and VCTK+DEMAND) show that the MA-Net outperforms recent models in terms of SE performance at a considerably reduced computational complexity, with 0.92M parameters, 0.09 RTF, and 1.32G/s MACs. On the WSJ0-SI84 dataset, MA-Net improves PESQ, STOI, and SI-SDR by 1.26, 20.3%, and 9.76 dB over noisy mixtures, highlighting the usefulness of MA-Net in real-world SE conditions.

Citation

Wahab, F. E., Ye, Z., Saleem, N., Ullah, R., & Hussain, A. (2025). MA-Net: Resource-efficient multi-attentional network for end-to-end speech enhancement. Neurocomputing, 619, Article 129150. https://doi.org/10.1016/j.neucom.2024.129150

Journal Article Type	Article
Acceptance Date	Dec 5, 2024
Online Publication Date	Dec 12, 2024
Publication Date	2025-02
Deposit Date	Jan 21, 2025
Journal	Neurocomputing
Print ISSN	0925-2312
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	619
Article Number	129150
DOI	https://doi.org/10.1016/j.neucom.2024.129150

A Novel Reciprocal Domain Adaptation Neural Network for Enhanced Diagnosis of Chronic Kidney Disease (2025)
Journal Article

Artificial intelligence enabled smart mask for speech recognition for future hearing devices (2024)
Journal Article

Are Foundation Models the Next-Generation Social Media Content Moderators? (2024)
Journal Article

A Hybrid Semantics and Syntax-Based Graph Convolutional Network for Aspect-Level Sentiment Classification (2024)
Journal Article

An Attention‐Driven Hybrid Deep Neural Network for Enhanced Heart Disease Classification (2024)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations