A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement

Hussain, Tassadaq; Wang, Wei-Chien; Gogate, Mandar; Dashtipour, Kia; Tsao, Yu; Lu, Xugang; Ahsan, Adeel; Hussain, Amir

doi:10.1109/TAI.2022.3169995

A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement

Hussain, Tassadaq; Wang, Wei-Chien; Gogate, Mandar; Dashtipour, Kia; Tsao, Yu; Lu, Xugang; Ahsan, Adeel; Hussain, Amir

Authors

Tassadaq Hussain

Wei-Chien Wang

Dr. Mandar Gogate M.Gogate@napier.ac.uk
Principal Research Fellow

Dr Kia Dashtipour K.Dashtipour@napier.ac.uk
Lecturer

Yu Tsao

Xugang Lu

Adeel Ahsan

Prof Amir Hussain A.Hussain@napier.ac.uk / hussain.doctor@gmail.com
Professor

Abstract

Removing background noise from acoustic observations to obtain clean signals is an important research topic regarding numerous real acoustic applications. Owing to their strong model capacity in function mapping, deep neural network-based algorithms have been successfully applied in target signal enhancement in acoustic applications. As most target signals carry semantic information encoded in a hierarchal structure in short-and long-term contexts , noise may distort such structures nonuniformly. In most deep neural network-based algorithms, such local and global effects are not explicitly considered in a modeling architecture for signal enhancement. In this paper, we propose a temporal attentive-pooling (TAP) mechanism combined with a conventional convolutional recurrent neural network (CRNN) model, called TAP-CRNN, which explicitly considers both global and local information for acoustic signal enhancement (ASE). In the TAP-CRNN model, we first use a convolution layer to extract local information from acoustic signals and a recurrent neural network (RNN) architecture to characterize temporal contextual information. Second, we exploit a novel attention mechanism to contextually process salient regions of noisy signals. We evaluate the proposed ASE system using an infant cry da-taset. The experimental results confirm the effectiveness of the proposed TAP-CRNN, compared with related deep neu-ral network models, and demonstrate that the proposed TAP-CRNN can more effectively reduce noise components from infant cry signals with unseen background noises at different signal-to-noise levels. Impact Statement-Recently proposed deep learning solutions have proven useful in overcoming certain limitations of conventional acoustic signal enhancement (ASE) tasks. However, the performance of these approaches under real acoustic conditions is not always satisfactory. In this study, we investigated the use of attention models for ASE. To the best of our knowledge, this is the first attempt to successfully employ a convolutional recurrent neural network (CRNN) with a temporal attentive pooling (TAP) algorithm for the ASE task. The proposed TAP-CRNN framework can practically benefit the as-sistive communication technology industry, such as the manufacture of hearing aid devices for the elderly and students. In addition, the derived algorithm can benefit other signal processing applications, such as soundscape information retrieval, sound environment analysis in smart homes, and automatic speech/speaker/language recognition systems. Index Terms-Acoustic signal enhancement, convolutional neural networks, recurrent neural networks, bidirectional long-short term memory.

Citation

Hussain, T., Wang, W.-C., Gogate, M., Dashtipour, K., Tsao, Y., Lu, X., Ahsan, A., & Hussain, A. (2022). A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement. IEEE Transactions on Artificial Intelligence, 3(5), 833-842. https://doi.org/10.1109/TAI.2022.3169995

Journal Article Type	Article
Acceptance Date	Apr 17, 2022
Online Publication Date	Apr 25, 2022
Publication Date	2022
Deposit Date	Apr 26, 2022
Publicly Available Date	Apr 26, 2022
Journal	IEEE Transactions on Artificial Intelligence
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Volume	3
Issue	5
Pages	833-842
DOI	https://doi.org/10.1109/TAI.2022.3169995
Public URL	http://researchrepository.napier.ac.uk/Output/2866944