Removing background noise from acoustic observations to obtain clean signals is an important research topic regarding numerous real acoustic applications. Owing to their strong model capacity in function mapping, deep neural network-based algorithms have been successfully applied in target signal enhancement in acoustic applications. As most target signals carry semantic information encoded in a hierarchal structure in short-and long-term contexts , noise may distort such structures nonuniformly. In most deep neural network-based algorithms, such local and global effects are not explicitly considered in a modeling architecture for signal enhancement. In this paper, we propose a temporal attentive-pooling (TAP) mechanism combined with a conventional convolutional recurrent neural network (CRNN) model, called TAP-CRNN, which explicitly considers both global and local information for acoustic signal enhancement (ASE). In the TAP-CRNN model, we first use a convolution layer to extract local information from acoustic signals and a recurrent neural network (RNN) architecture to characterize temporal contextual information. Second, we exploit a novel attention mechanism to contextually process salient regions of noisy signals. We evaluate the proposed ASE system using an infant cry da-taset. The experimental results confirm the effectiveness of the proposed TAP-CRNN, compared with related deep neu-ral network models, and demonstrate that the proposed TAP-CRNN can more effectively reduce noise components from infant cry signals with unseen background noises at different signal-to-noise levels. Impact Statement-Recently proposed deep learning solutions have proven useful in overcoming certain limitations of conventional acoustic signal enhancement (ASE) tasks. However, the performance of these approaches under real acoustic conditions is not always satisfactory. In this study, we investigated the use of attention models for ASE. To the best of our knowledge, this is the first attempt to successfully employ a convolutional recurrent neural network (CRNN) with a temporal attentive pooling (TAP) algorithm for the ASE task. The proposed TAP-CRNN framework can practically benefit the as-sistive communication technology industry, such as the manufacture of hearing aid devices for the elderly and students. In addition, the derived algorithm can benefit other signal processing applications, such as soundscape information retrieval, sound environment analysis in smart homes, and automatic speech/speaker/language recognition systems. Index Terms-Acoustic signal enhancement, convolutional neural networks, recurrent neural networks, bidirectional long-short term memory.
Hussain, T., Wang, W., Gogate, M., Dashtipour, K., Tsao, Y., Lu, X., …Hussain, A. (2022). A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement. IEEE Transactions on Artificial Intelligence, 3(5), 833-842. https://doi.org/10.1109/TAI.2022.3169995