Tassadaq Hussain
A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement
Hussain, Tassadaq; Wang, Wei-Chien; Gogate, Mandar; Dashtipour, Kia; Tsao, Yu; Lu, Xugang; Ahsan, Adeel; Hussain, Amir
Authors
Wei-Chien Wang
Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow
Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer
Yu Tsao
Xugang Lu
Adeel Ahsan
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Abstract
Removing background noise from acoustic observations to obtain clean signals is an important research topic regarding numerous real acoustic applications. Owing to their strong model capacity in function mapping, deep neural network-based algorithms have been successfully applied in target signal enhancement in acoustic applications. As most target signals carry semantic information encoded in a hierarchal structure in short-and long-term contexts , noise may distort such structures nonuniformly. In most deep neural network-based algorithms, such local and global effects are not explicitly considered in a modeling architecture for signal enhancement. In this paper, we propose a temporal attentive-pooling (TAP) mechanism combined with a conventional convolutional recurrent neural network (CRNN) model, called TAP-CRNN, which explicitly considers both global and local information for acoustic signal enhancement (ASE). In the TAP-CRNN model, we first use a convolution layer to extract local information from acoustic signals and a recurrent neural network (RNN) architecture to characterize temporal contextual information. Second, we exploit a novel attention mechanism to contextually process salient regions of noisy signals. We evaluate the proposed ASE system using an infant cry da-taset. The experimental results confirm the effectiveness of the proposed TAP-CRNN, compared with related deep neu-ral network models, and demonstrate that the proposed TAP-CRNN can more effectively reduce noise components from infant cry signals with unseen background noises at different signal-to-noise levels. Impact Statement-Recently proposed deep learning solutions have proven useful in overcoming certain limitations of conventional acoustic signal enhancement (ASE) tasks. However, the performance of these approaches under real acoustic conditions is not always satisfactory. In this study, we investigated the use of attention models for ASE. To the best of our knowledge, this is the first attempt to successfully employ a convolutional recurrent neural network (CRNN) with a temporal attentive pooling (TAP) algorithm for the ASE task. The proposed TAP-CRNN framework can practically benefit the as-sistive communication technology industry, such as the manufacture of hearing aid devices for the elderly and students. In addition, the derived algorithm can benefit other signal processing applications, such as soundscape information retrieval, sound environment analysis in smart homes, and automatic speech/speaker/language recognition systems. Index Terms-Acoustic signal enhancement, convolutional neural networks, recurrent neural networks, bidirectional long-short term memory.
Citation
Hussain, T., Wang, W., Gogate, M., Dashtipour, K., Tsao, Y., Lu, X., Ahsan, A., & Hussain, A. (2022). A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement. IEEE Transactions on Artificial Intelligence, 3(5), 833-842. https://doi.org/10.1109/TAI.2022.3169995
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 17, 2022 |
Online Publication Date | Apr 25, 2022 |
Publication Date | 2022 |
Deposit Date | Apr 26, 2022 |
Publicly Available Date | Apr 26, 2022 |
Journal | IEEE Transactions on Artificial Intelligence |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 3 |
Issue | 5 |
Pages | 833-842 |
DOI | https://doi.org/10.1109/TAI.2022.3169995 |
Public URL | http://researchrepository.napier.ac.uk/Output/2866944 |
Files
A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement
(5.7 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Statistical Downscaling Modeling for Temperature Prediction
(2024)
Book Chapter
Federated Learning for Market Surveillance
(2024)
Book Chapter
Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN
(2024)
Journal Article
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search