Generally Intelligent

Jack Parker-Holder, DeepMind: Open-endedness, evolving agents and environments, online adaptation, and offline learning

16 snips
Dec 6, 2022
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Action Vectors Fail As General Behavior Metrics

  • Behavior-space measures built from raw actions can be misleading because many actions are redundant; domain-specific behavior encodings or payoff-based diversity are usually required.
  • Jack found humanoid policies looked similar despite DPP diversity because differing arm motions didn't change task-relevant leg behavior.
INSIGHT

Measure Diversity By Who Solves Which Environment

  • Payoff or response diversity (which agent solves which environment) reveals meaningful behavioural differences and avoids superficial action variance.
  • Jack highlights MCC-like objectives where each agent uniquely solves some environments and each environment is solved by a single agent.
ADVICE

Adapt Exploration Between Reward And Disagreement

  • Actively collect data to improve world models by optimizing exploration objectives that trade reward vs. model disagreement, and adapt that tradeoff online.
  • Jack's
Get the Snipd Podcast app to discover more snips from this episode
Get the app