TY - GEN
T1 - Expert Initialized Reinforcement Learning with Application to Robotic Assembly
AU - Langaa, Jeppe
AU - Sloth, Christoffer
N1 - Funding Information:
*This work was supported by the PIRAT project, funded by Innovation Fund Denmark, grant number 9069-00046B. All authors are with the Maersk McKinney Moller Institute, University of Southern Denmark, Odense, Denmark
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This paper investigates the advantages and boundaries of actor-critic reinforcement learning algorithms in an industrial setting. We compare and discuss Cycle of Learning, Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient with respect to performance in simulation as well as on a real robot setup. Furthermore, it emphasizes the importance and potential of combining demonstrated expert behavior with the actor-critic reinforcement learning setting while using it with an admittance controller to solve an industrial assembly task. Cycle of Learning and Twin Delayed Deep Deterministic Policy Gradient showed to be equally usable in simulation, while Cycle of Learning proved to be best on a real world application due to the behavior cloning loss that enables the agent to learn rapidly. The results also demonstrated that it is a necessity to incorporate an admittance controller in order to transfer the learned behavior to a real robot.
AB - This paper investigates the advantages and boundaries of actor-critic reinforcement learning algorithms in an industrial setting. We compare and discuss Cycle of Learning, Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient with respect to performance in simulation as well as on a real robot setup. Furthermore, it emphasizes the importance and potential of combining demonstrated expert behavior with the actor-critic reinforcement learning setting while using it with an admittance controller to solve an industrial assembly task. Cycle of Learning and Twin Delayed Deep Deterministic Policy Gradient showed to be equally usable in simulation, while Cycle of Learning proved to be best on a real world application due to the behavior cloning loss that enables the agent to learn rapidly. The results also demonstrated that it is a necessity to incorporate an admittance controller in order to transfer the learned behavior to a real robot.
U2 - 10.1109/CASE49997.2022.9926540
DO - 10.1109/CASE49997.2022.9926540
M3 - Article in proceedings
AN - SCOPUS:85141660939
T3 - Proceedings - IEEE International Conference on Automation Science and Engineering
SP - 1405
EP - 1410
BT - 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)
PB - IEEE Computer Society
T2 - 18th IEEE International Conference on Automation Science and Engineering, CASE 2022
Y2 - 20 August 2022 through 24 August 2022
ER -