Skip to main content

Research Repository

Advanced Search

Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition

Xu, Haotian; Jin, Xiaobo; Wang, Qiufeng; Hussain, Amir; Huang, Kaizhu


Haotian Xu

Xiaobo Jin

Qiufeng Wang

Kaizhu Huang


Currently, many action recognition methods mostly consider the information from spatial streams. We propose a new perspective inspired by the human visual system to combine both spatial and temporal streams to measure their attention consistency. Specifically, a branch-independent convolutional neural network (CNN) based algorithm is developed with a novel attention-consistency loss metric, enabling the temporal stream to concentrate on consistent discriminative regions with the spatial stream in the same period. The consistency loss is further combined with the cross-entropy loss to enhance the visual attention consistency. We evaluate the proposed method for action recognition on two benchmark datasets: Kinetics400 and UCF101. Despite its apparent simplicity, our proposed framework with the attention consistency achieves better performance than most of the two-stream networks, i.e. 75.7% top-1 accuracy on Kinetics400 and 95.7% on UCF101, while reducing 7.1% computational cost compared with our baseline. Particularly, our proposed method can attain remarkable improvements on complex action classes, showing that our proposed network can act as a potential benchmark to handle complicated scenarios in industry 4.0 applications.

Journal Article Type Article
Acceptance Date Apr 6, 2022
Online Publication Date May 28, 2022
Publication Date Oct 6, 2022
Deposit Date Jun 22, 2022
Publicly Available Date Jul 8, 2022
Journal ACM Transactions on Multimedia Computing, Communications, and Applications
Print ISSN 1551-6857
Electronic ISSN 1551-6865
Publisher Association for Computing Machinery (ACM)
Peer Reviewed Peer Reviewed
Volume 18
Issue 2S
Article Number 119
Keywords Action Recognition, Attention Consistency, Multi-level Attention, Two-stream Structure
Public URL


Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition (accepted version) (23.5 Mb)

You might also like

Downloadable Citations