Insights into sample efficient quadrupedal robot locomotion

Abstract

Autonomous quadrupedal robot locomotion is challenging. Traditional approaches require expert knowledge of robot mechanics and contact dynamics, but modern machine learning-based methods, such as reinforcement learning and imitation learning, can find locomotion policies purely from interactions with the environment. Unfortunately, these methods typically require many samples and trials to learn, particularly in high-dimensional systems. This makes them unable to quickly adapt to changing environments, thus limiting their practical utility. We review approaches to enhancing sample efficiency, including choices of reward, action and observation spaces, as well as off-policy, model-based, and step-based learning. We provide examples of these principles using several popular reinforcement learning agents applied to a quadruped in simulation. Finally, we discuss unexplored avenues to further improve sample efficiency.

Publication
Submitted to Frontiers in Robotics & Artificial Intelligence
Wouter Kouw
Wouter Kouw
Assistant Professor