RoboPapers

Ep#52: Probe, Learn, Distill: Self-improving Vision-Language-Action Models

6 snips
Dec 12, 2025
Wenli Xiao, a PhD student and robotics researcher, introduces her innovative Probe, Learn, Distill (PLD) method for enhancing vision-language-action models. She details how freezing a VLA's backbone and training lightweight residual actors can improve reliability in complex tasks. Wenli also discusses the use of hybrid rollouts for optimizing data collection and the significance of training on fewer tasks to generalize better on unseen challenges. Her insights on continual learning and practical workflows could reshape the future of robotics!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Eight Hours Beside The Robot

  • Wenli spent about eight hours sitting beside the YAM arm watching it do RL during real-world training.
  • The process was painful but produced strong performance on the delicate GPU-insertion task.
INSIGHT

Residual Actors Fix VLA Errors

  • PLD trains lightweight residual actors on top of a frozen VLA to correct errors without touching the base model architecture.
  • This makes the method policy-agnostic and avoids retraining large VLA heads while achieving high success rates.
ADVICE

Collect Hybrid Rollouts For Better SFT

  • During data collection, run the base VLA for early steps and switch to the residual RL to capture both recovery and optimal behaviors.
  • This hybrid rollout yields diverse, deployment-aligned trajectories that improve SFT downstream.
Get the Snipd Podcast app to discover more snips from this episode
Get the app