Skip to main content

Research Repository

Advanced Search

A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement

Hussain, Tassadaq; Wang, Wei-Chien; Gogate, Mandar; Dashtipour, Kia; Tsao, Yu; Lu, Xugang; Ahsan, Adeel; Hussain, Amir


Wei-Chien Wang

Yu Tsao

Xugang Lu

Adeel Ahsan


Removing background noise from acoustic observations to obtain clean signals is an important research topic regarding numerous real acoustic applications. Owing to their strong model capacity in function mapping, deep neural network-based algorithms have been successfully applied in target signal enhancement in acoustic applications. As most target signals carry semantic information encoded in a hierarchal structure in short-and long-term contexts , noise may distort such structures nonuniformly. In most deep neural network-based algorithms, such local and global effects are not explicitly considered in a modeling architecture for signal enhancement. In this paper, we propose a temporal attentive-pooling (TAP) mechanism combined with a conventional convolutional recurrent neural network (CRNN) model, called TAP-CRNN, which explicitly considers both global and local information for acoustic signal enhancement (ASE). In the TAP-CRNN model, we first use a convolution layer to extract local information from acoustic signals and a recurrent neural network (RNN) architecture to characterize temporal contextual information. Second, we exploit a novel attention mechanism to contextually process salient regions of noisy signals. We evaluate the proposed ASE system using an infant cry da-taset. The experimental results confirm the effectiveness of the proposed TAP-CRNN, compared with related deep neu-ral network models, and demonstrate that the proposed TAP-CRNN can more effectively reduce noise components from infant cry signals with unseen background noises at different signal-to-noise levels. Impact Statement-Recently proposed deep learning solutions have proven useful in overcoming certain limitations of conventional acoustic signal enhancement (ASE) tasks. However, the performance of these approaches under real acoustic conditions is not always satisfactory. In this study, we investigated the use of attention models for ASE. To the best of our knowledge, this is the first attempt to successfully employ a convolutional recurrent neural network (CRNN) with a temporal attentive pooling (TAP) algorithm for the ASE task. The proposed TAP-CRNN framework can practically benefit the as-sistive communication technology industry, such as the manufacture of hearing aid devices for the elderly and students. In addition, the derived algorithm can benefit other signal processing applications, such as soundscape information retrieval, sound environment analysis in smart homes, and automatic speech/speaker/language recognition systems. Index Terms-Acoustic signal enhancement, convolutional neural networks, recurrent neural networks, bidirectional long-short term memory.

Journal Article Type Article
Acceptance Date Apr 17, 2022
Online Publication Date Apr 25, 2022
Publication Date 2022
Deposit Date Apr 26, 2022
Publicly Available Date Apr 26, 2022
Journal IEEE Transactions on Artificial Intelligence
Publisher Institute of Electrical and Electronics Engineers
Peer Reviewed Peer Reviewed
Volume 3
Issue 5
Pages 833-842
Public URL


You might also like

Downloadable Citations