Table of Contents
- About
- Keywords
- Abstract
- TRPO Demonstration Low Reward
- TRPO Demonstration High Reward
- TRPO Demonstration Timelapse
- Final Presentation
- Project Write Up
- Return to Grad Projects
About
Reinforcement Learning (RL) reduces the mathematical complexity of robotic tasks such as reaching by rewarding or penalizing a system through a series of training tasks. This project improved the reproducibility of an RL project revolving around real-reaching tasks with a UR5 arm. Overall, two RL methods were applied to the UR5: trust region policy optimization (TRPO) and proximal policy optimization (PPO). These then trained in real-time on the UR5 hardware, allowing for the UR5 arm to optimally reach for specific points in a 2D space.
For more in-depth description and visualization refer to the Final Presentation.
Keywords
Reinforcement Learning, Python, UR5, Ubuntu, Virtual Machine, Github, Docker
Abstract
Robotics are becoming increasingly usable in commercial and industrial settings to perform monotonous/repetitive tasks. However, the controls architecture for a system developed for basic actuation and minimal environmental cognition feedback is very different than one that is aware of its environment and uses this intuition to base current and future decisions off of. As this trend continues, there is demand in the space for improvement of autonomy and robustness of these systems. Our team investigated the practical implementation of different Reinforcement Learning (RL) algorithms and approaches on an UR5 robotic manipulator. The team found that benchmarking different solutions provided a sense of practical implementation of our studies, and we believe this work also serves as a resource with tangible juxtaposition of different RL approaches; analytically explaining the pros and cons of optimal agent creation, training, and implementation.The team also evaluated the replication of some of the most state-of-the-art algorithms and development environments for real-world robotic implementation, that being SenseAct. As replication is one of the most difficult things to accomplish in this novel space, the successful implementation of our project supports the viability and usability of SenseAct as a launching platform for Reinforcement learning deployment on their supported robotic platforms.
TRPO Demonstration Low Reward
Initial RL with a very low reward (-69).
TRPO Demonstration High Reward
Near the end of the RL simulation with a high reward (+257).
TRPO Demonstration Timelapse
Timelapse of the UR5 arm performing reaching tasks, reward of +257.