Skip to main content

Research Repository

Advanced Search

Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition

Jalal, Md. Asif; Loweimi, Erfan; Moore, Roger K.; Hain, Thomas

Authors

Md. Asif Jalal

Erfan Loweimi

Roger K. Moore

Thomas Hain



Abstract

Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirectional long short-term memory network (BLSTM), CNN and Capsule networks. The BLSTM deals with the temporal dynamics of the speech signal by effectively representing forward/backward contextual information while the CNN along with the dynamic routing of the Capsule net learn temporal clusters which altogether provide a state-of-the-art technique for classifying the extracted patterns. The proposed approach was compared with a wide range of architectures on the FAU-Aibo and RAVDESS corpora and remarkable gain over state-of-the-art systems were obtained. For FAO-Aibo and RAVDESS 77.6% and 56.2% accuracy was achieved, respectively, which is 3% and 14% (absolute) higher than the best-reported result for the respective tasks.

Citation

Jalal, M. A., Loweimi, E., Moore, R. K., & Hain, T. (2019). Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. In Proc. Interspeech 2019 (1701-1705). https://doi.org/10.21437/interspeech.2019-3068

Presentation Conference Type Conference Paper (Published)
Conference Name Interspeech 2019
Start Date Sep 15, 2019
End Date Sep 19, 2019
Online Publication Date Sep 15, 2019
Publication Date 2019
Deposit Date Apr 3, 2024
Pages 1701-1705
Book Title Proc. Interspeech 2019
DOI https://doi.org/10.21437/interspeech.2019-3068
Public URL http://researchrepository.napier.ac.uk/Output/3585898


Downloadable Citations