RoboPapers

Chris Paxton and Michael Cho

Chris Paxton & Michael Cho geek out over robotic papers with paper authors. robopapers.substack.com

Episodes

Mentioned books

Nov 20, 2025 • 58min

Ep#043: Attention-based map encoding for learning generalized legged locomotion

Walking robots can do all kinds of exciting things like dancing, running, and martial arts — but for them to be useful, they must be able to use their legs to handle terrain, to move over obstacles not just around them. So, how can we train walking policies for legged robots that are useful?Unlike with manipulation, these policies are trained with end-to-end, sim-to-real reinforcement learning, using attention. Turns out maybe “attention is all you need” also applies to locomotion. Chong Zhang joins us to explain more.Watch Episode #43 of RoboPapers, hosted by Michael Cho and Chris Paxton, now, to find out more.Abstract:Dynamic locomotion of legged robots is a critical yet challenging topic in expanding the operational range of mobile robots. It requires precise planning when possible footholds are sparse, robustness against uncertainties and disturbances, and generalizability across diverse terrains. Although traditional model-based controllers excel at planning on complex terrains, they struggle with real-world uncertainties. Learning-based controllers offer robustness to such uncertainties but often lack precision on terrains with sparse steppable areas. Hybrid methods achieve enhanced robustness on sparse terrains by combining both methods but are computationally demanding and constrained by the inherent limitations of model-based planners. To achieve generalized legged locomotion on diverse terrains while preserving the robustness of learning-based controllers, this paper proposes an attention-based map encoding conditioned on robot proprioception, which is trained as part of the controller using reinforcement learning. We show that the network learns to focus on steppable areas for future footholds when the robot dynamically navigates diverse and challenging terrains. We synthesized behaviors that exhibited robustness against uncertainties while enabling precise and agile traversal of sparse terrains. In addition, our method offers a way to interpret the topographical perception of a neural network. We have trained two controllers for a 12-degrees-of-freedom quadrupedal robot and a 23-degrees-of-freedom humanoid robot and tested the resulting controllers in the real world under various challenging indoor and outdoor scenarios, including ones unseen during training.Paper in Science RoboticsArXiV This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit robopapers.substack.com

4 snips

Nov 13, 2025 • 54min

Ep#42: General Intuition

Discover how AI can learn from video games to create predictive world models. The team shares insights on using diffusion models for better visual detail in training agents. They explore the challenges of multi-player dynamics and the importance of high-quality action labels. The discussion includes innovations for stability and speed in model training, as well as the advantages of transferring knowledge across different games. Learn about their mission to develop general agents for complex reasoning in three-dimensional spaces.

Nov 5, 2025 • 41min

Ep#41: HITTER: A Humanoid Table Tennis Robot via Hierarchical Planning and Learning

How can we make a humanoid robot play table tennis? The robot must hit a moving ball and return it over and over again, requiring precise whole-body control over again. Zhi Su tells us about how he developed a hierarchical approach for planning an whole body control that lets people play this game with a humanoid robot.Watch Episode #41 of RoboPapers with Michael Cho and Chris Paxton now!Abstract:Humanoid robots have recently achieved impressive progress in locomotion and whole-body control, yet they remain constrained in tasks that demand rapid interaction with dynamic environments through manipulation. Table tennis exemplifies such a challenge: with ball speeds exceeding 5 m/s, players must perceive, predict, and act within sub-second reaction times, requiring both agility and precision. To address this, we present a hierarchical framework for humanoid table tennis that integrates a model-based planner for ball trajectory prediction and racket target planning with a reinforcement learning-based whole-body controller. The planner determines striking position, velocity and timing, while the controller generates coordinated arm and leg motions that mimic human strikes and maintain stability and agility across consecutive rallies. Moreover, to encourage natural movements, human motion references are incorporated during training. We validate our system on a general-purpose humanoid robot, achieving up to 106 consecutive shots with a human opponent and sustained exchanges against another humanoid. These results demonstrate real-world humanoid table tennis with sub-second reactive control, marking a step toward agile and interactive humanoid behaviors.Project PageArXiV This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit robopapers.substack.com

Nov 3, 2025 • 1h 1min

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

RoboPapers

Episodes

Mentioned books

Ep#043: Attention-based map encoding for learning generalized legged locomotion

Ep#42: General Intuition

Ep#41: HITTER: A Humanoid Table Tennis Robot via Hierarchical Planning and Learning

Ep#40: Daxo Robotics

Ep#39: MolmoAct: An Action Reasoning Model that reasons in 3D space

Ep#38: Q Learning is Not Yet Scalable

Ep#37: AMPLIFY: Actionless Motion Priors for Robot Learning from Videos

Ep#36: Whole-Body Conditioned Egocentric Video Prediction

Ep#35: Reinforcement Learning with Action Chunking

Ep#34: RoboArena

The AI-powered Podcast Player