5 minute read

How terrain semantics help quadruped learn to walk

C o n g r a t u l a t e s

maxon is a developer and manufacturer of brushed and brushless DC motors. as well as gearheads, encoders, controllers, and entire mechatronic systems. maxon drives are used wherever the requirements are particularly high: in NASA’s Mars rovers, in surgical power tools, in humanoid robots and in precision industrial applications, for example. To maintain its leadership in this demanding market, the company invests a considerable share of its annual revenue in research and development. Worldwide, maxon has more than 3000 employees at nine production sites and is represented by sales companies in more than 30 countries.

Motion Control

How terrain semantics

help quadruped learn to walkGoogle’s robotics researchers developed a learning framework that decides the locomotion skill for a quadruped, including the speed and Yuxiang Yang Researcher, Robotics at Google gait of the robot based on the perceived semantics. This allows the robot to walk robustly on a variety of off-road terrains, including rocks, pebbles, deep grass, mud, and more.

An important promise for quadrupedal robots is their potential to operate in complex outdoor environments that are difficult or inaccessible for humans. Whether it’s to find natural resources deep in the mountains or to search for life signals in heavily-damaged earthquake sites, a robust and versatile quadrupedal robot could be very helpful. To achieve that, a robot needs to perceive the environment, understand its locomotion challenges, and adapt its locomotion skill accordingly.

While recent advances in perceptive locomotion have greatly enhanced the capability of quadrupedal robots, most work focuses on indoor or urban environments, thus they cannot effectively handle the complexity of offroad terrains. In these environments, the robot needs to understand not only the terrain shape (e.g., slope angle, smoothness), but also its contact properties (e.g., friction, restitution, deformability), which are important for a robot to decide its locomotion skills. As existing perceptive locomotion systems mostly focus on the use of depth cameras or LiDARs, it can be difficult for these systems to estimate such terrain properties accurately.

In “Learning Semantics-Aware Locomotion Skills from Human Demonstrations,” we designed a hierarchical learning framework to improve a robot’s ability to traverse complex, off-road environments. Unlike previous approaches that focus on environment geometry, such as terrain shape and obstacle locations, we focus on environment semantics, such as terrain type

Google tested its learning framework on an A1 quadruped from Chinese startup Unitree Robotics. | Unitree Robotics

Motion Control

Google’s learning framework selects different speeds based on the conditions of the terrain. | Google

(grass, mud, etc.) and contact properties, which provide a complementary set of information useful for off-road environments. As the robot walks, the framework decides the locomotion skill, including the speed and gait (i.e., shape and timing of the legs’ movement) of the robot based on the perceived semantics, which allows the robot to walk robustly on a variety of off-road terrains, including rocks, pebbles, deep grass, mud, and more.

Overview The hierarchical framework consists of a high-level skill policy and a low-level motor controller. The skill policy selects a locomotion skill based on camera images, and the motor controller converts the selected skill into motor commands. The high-level skill policy is further decomposed into a learned speed policy and a heuristic-based gait selector. To decide on a skill, the speed policy first computes the desired forward speed, based on the semantic information from the onboard RGB camera.

For energy efficiency and robustness, quadrupedal robots usually select a different gait for each speed, so we designed the gait selector to compute a desired gait based on the forward speed. Lastly, a low-level convex modelpredictive controller (MPC) converts the desired locomotion skill into motor torque commands and executes them on the real hardware. We train the speed policy directly in the real world using imitation learning because it requires less training data compared to standard reinforcement learning algorithms.

Learning speed command from human demonstrations As the central component in our pipeline, the speed policy outputs the desired forward speed of the robot based on the RGB image from the onboard camera. Although many robot learning tasks can leverage simulation as a source of lower-cost data collection, we train the speed policy in the real world because the accurate simulation of complex and diverse off-road environments is not yet available. As policy learning in the real world is time-consuming and potentially unsafe, we make two key design choices to improve the data efficiency and safety of our system.

The first is learning from human demonstrations. Standard reinforcement learning algorithms typically learn by exploration, where the agent attempts different actions in an environment and builds preferences based on the rewards received. However, such explorations can be potentially unsafe, especially in off-road environments, since any robot failures can damage both the robot hardware and the surrounding environment. To ensure safety, we train the speed policy using imitation learning from human demonstrations. We first ask a human operator to teleoperate the robot on a variety of off-road terrains, where the operator controls the speed and heading of the robot using a remote joystick. Next, we collect the training data and then train the speed policy using standard supervised learning to predict the human operator’s speed command. As it turns out, the human demonstration is both safe and highquality, and allows the robot to learn a proper speed choice for different terrains.

The second key design choice is the training method. Deep neural networks, especially those involving high-dimensional visual inputs, typically require lots of data to train. To reduce the amount of real-world training data required, we first pre-train a semantic segmentation model on RUGD (an off-road driving dataset where the images look similar to those captured by the robot’s onboard camera), where the model predicts the semantic class (grass, mud, etc.) for every pixel in the camera image. We then extract a semantic embedding from the model’s intermediate layers and use that as the feature for on-robot training. With the pre-trained semantic embedding, we can train the speed policy effectively using less than 30 minutes of real-world data, which greatly reduces the amount of effort required.

Gait selection and motor control The next component in the pipeline, the gait selector, computes the appropriate gait based on the speed command from the speed policy. The gait of a robot, including its stepping frequency, swing height, and base height, can greatly affect the robot’s ability to traverse different terrains.

Scientific studies have shown that animals switch between different gaits at different speeds, and this result is further validated in quadrupedal robots, so we designed the gait selector to compute a robust gait for each speed. Compared to using a fixed gait across all speeds, we find that the gait selector further enhances the robot’s navigation performance on off-road terrains.

This article is from: