Guided Policy Search for Sequential Multitask Learning

Xiong, Fangzhou; Sun, Biao; Yang, Xu; Qiao, Hong; Huang, Kaizhu; Hussain, Amir; Liu, Zhiyong

doi:10.1109/tsmc.2018.2800040

Guided Policy Search for Sequential Multitask Learning

Xiong, Fangzhou; Sun, Biao; Yang, Xu; Qiao, Hong; Huang, Kaizhu; Hussain, Amir; Liu, Zhiyong

Authors

Fangzhou Xiong

Biao Sun

Xu Yang

Hong Qiao

Kaizhu Huang

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Zhiyong Liu

Abstract

Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.

Citation

Xiong, F., Sun, B., Yang, X., Qiao, H., Huang, K., Hussain, A., & Liu, Z. (2019). Guided Policy Search for Sequential Multitask Learning. IEEE Transactions on Systems, Man and Cybernetics: Systems, 49(1), 216-226. https://doi.org/10.1109/tsmc.2018.2800040

Journal Article Type	Article
Acceptance Date	Jan 11, 2018
Online Publication Date	Feb 18, 2018
Publication Date	2019-01
Deposit Date	Jul 19, 2019
Publicly Available Date	Jul 19, 2019
Journal	IEEE Transactions on Systems, Man, and Cybernetics: Systems
Print ISSN	2168-2216
Electronic ISSN	2168-2232
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Volume	49
Issue	1
Pages	216-226
DOI	https://doi.org/10.1109/tsmc.2018.2800040
Keywords	Control and Systems Engineering; Human-Computer Interaction; Electrical and Electronic Engineering; Software; Computer Science Applications
Public URL	http://researchrepository.napier.ac.uk/Output/1450377
Related Public URLs	https://www.storre.stir.ac.uk/handle/1893/27076
Contract Date	Jul 19, 2019