Table of Contents

About

Reinforcement Learning (RL) reduces the mathematical complexity of robotic tasks such as reaching by rewarding or penalizing a system through a series of training tasks. This project improved the reproducibility of an RL project revolving around real-reaching tasks with a UR5 arm. Overall, two RL methods were applied to the UR5: trust region policy optimization (TRPO) and proximal policy optimization (PPO). These then trained in real-time on the UR5 hardware, allowing for the UR5 arm to optimally reach for specific points in a 2D space.

For more in-depth description and visualization refer to the Final Presentation.

Keywords

Reinforcement Learning, Python, UR5, Ubuntu, Virtual Machine, Github, Docker

Abstract

Robotics are becoming increasingly usable in commercial and industrial settings to perform monotonous/repetitive tasks. However, the controls architecture for a system developed for basic actuation and minimal environmental cognition feedback is very different than one that is aware of its environment and uses this intuition to base current and future decisions off of. As this trend continues, there is demand in the space for improvement of autonomy and robustness of these systems. Our team investigated the practical implementation of different Reinforcement Learning (RL) algorithms and approaches on an UR5 robotic manipulator. The team found that benchmarking different solutions provided a sense of practical implementation of our studies, and we believe this work also serves as a resource with tangible juxtaposition of different RL approaches; analytically explaining the pros and cons of optimal agent creation, training, and implementation.The team also evaluated the replication of some of the most state-of-the-art algorithms and development environments for real-world robotic implementation, that being SenseAct. As replication is one of the most difficult things to accomplish in this novel space, the successful implementation of our project supports the viability and usability of SenseAct as a launching platform for Reinforcement learning deployment on their supported robotic platforms.

TRPO Demonstration Low Reward

Initial RL with a very low reward (-69).

TRPO Demonstration High Reward

Near the end of the RL simulation with a high reward (+257).

TRPO Demonstration Timelapse

Timelapse of the UR5 arm performing reaching tasks, reward of +257.

Final Presentation



Return to top