Fangzhou Xiong
Guided Policy Search for Sequential Multitask Learning
Xiong, Fangzhou; Sun, Biao; Yang, Xu; Qiao, Hong; Huang, Kaizhu; Hussain, Amir; Liu, Zhiyong
Authors
Biao Sun
Xu Yang
Hong Qiao
Kaizhu Huang
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Zhiyong Liu
Abstract
Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.
Citation
Xiong, F., Sun, B., Yang, X., Qiao, H., Huang, K., Hussain, A., & Liu, Z. (2019). Guided Policy Search for Sequential Multitask Learning. IEEE Transactions on Systems, Man and Cybernetics: Systems, 49(1), 216-226. https://doi.org/10.1109/tsmc.2018.2800040
Journal Article Type | Article |
---|---|
Acceptance Date | Jan 11, 2018 |
Online Publication Date | Feb 18, 2018 |
Publication Date | 2019-01 |
Deposit Date | Jul 19, 2019 |
Publicly Available Date | Jul 19, 2019 |
Journal | IEEE Transactions on Systems, Man, and Cybernetics: Systems |
Print ISSN | 2168-2216 |
Electronic ISSN | 2168-2232 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 49 |
Issue | 1 |
Pages | 216-226 |
DOI | https://doi.org/10.1109/tsmc.2018.2800040 |
Keywords | Control and Systems Engineering; Human-Computer Interaction; Electrical and Electronic Engineering; Software; Computer Science Applications |
Public URL | http://researchrepository.napier.ac.uk/Output/1450377 |
Related Public URLs | https://www.storre.stir.ac.uk/handle/1893/27076 |
Contract Date | Jul 19, 2019 |
Files
Guided Policy Search for Sequential Multitask Learning
(549 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-sa/4.0/
Copyright Statement
This work is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike Licence.
You might also like
Learning Latent Features With Infinite Nonnegative Binary Matrix Trifactorization
(2018)
Journal Article
Cross-modality interactive attention network for multispectral pedestrian detection
(2018)
Journal Article
Toward's Arabic multi-modal sentiment analysis
(2018)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search