A non-parametric softmax for improving neural attention in time-series forecasting

Totaro, Simone; Hussain, Amir; Scardapane, Simone

doi:10.1016/j.neucom.2019.10.084

A non-parametric softmax for improving neural attention in time-series forecasting

Totaro, Simone; Hussain, Amir; Scardapane, Simone

Authors

Simone Totaro

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Simone Scardapane

Abstract

Neural attention has become a key component in many deep learning applications, ranging from machine translation to time series forecasting. While many variations of attention have been developed over recent years, all share a common component in the application of a softmax function to normalize the attention weights, in order to transform them into valid mixing coefficients. In this paper, we aim to improve the modeling flexibility of a generic attention module by innovatively replacing this softmax operation with a learnable softmax, in which the normalizing functions are also adapted from the data. Specifically, our generalized softmax builds upon recent work in learning activation functions for deep networks, in particular the kernel activation function and its extensions. We describe the application of the proposed technique for the challenging case of time series forecasting with the dual-stage attention-based recurrent neural network (DA-RNN), an innovative model for predicting time series that employs two different attention modules for handling exogenous factors and long-term dependencies. A series of real-world benchmarks are used to show that simply plugging-in our generalized attention model can improve results on all datasets, even when keeping the number of trainable parameters in the model constant. To further evaluate the algorithm, we collect a novel dataset for predicting the Bitcoin closing exchange rate, a problem of high practical significance lately. Finally, to foster research in the topic, we also release both the dataset and our model as an open source extensible library. Over a baseline DA-RNN, our proposed model delivers an improvement of MAR ranging from 6% to 15% using our newly-released dataset.

Citation

Totaro, S., Hussain, A., & Scardapane, S. (2020). A non-parametric softmax for improving neural attention in time-series forecasting. Neurocomputing, 381, 177-185. https://doi.org/10.1016/j.neucom.2019.10.084

Journal Article Type	Article
Acceptance Date	Oct 22, 2019
Online Publication Date	Oct 31, 2019
Publication Date	2020-03
Deposit Date	Apr 6, 2020
Journal	Neurocomputing
Print ISSN	0925-2312
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	381
Pages	177-185
DOI	https://doi.org/10.1016/j.neucom.2019.10.084
Keywords	Attention, Activation function, Softmax, Time series forecasting
Public URL	http://researchrepository.napier.ac.uk/Output/2566563

MTFDN: An image copy‐move forgery detection method based on multi‐task learning (2024)
Journal Article

Transition-aware human activity recognition using an ensemble deep learning framework (2024)
Journal Article

Federated learning‐driven dual blockchain for data sharing and reputation management in Internet of medical things (2024)
Journal Article

Utilizing ubiquitous learning to foster sustainable development in rural areas: Insights from 6G technology (2024)
Journal Article

A binary particle swarm optimization-based pruning approach for environmentally sustainable and robust CNNs (2024)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations