
Generally Intelligent Jack Parker-Holder, DeepMind: Open-endedness, evolving agents and environments, online adaptation, and offline learning
16 snips
Dec 6, 2022 AI Snips
Chapters
Transcript
Episode notes
Action Vectors Fail As General Behavior Metrics
- Behavior-space measures built from raw actions can be misleading because many actions are redundant; domain-specific behavior encodings or payoff-based diversity are usually required.
- Jack found humanoid policies looked similar despite DPP diversity because differing arm motions didn't change task-relevant leg behavior.
Measure Diversity By Who Solves Which Environment
- Payoff or response diversity (which agent solves which environment) reveals meaningful behavioural differences and avoids superficial action variance.
- Jack highlights MCC-like objectives where each agent uniquely solves some environments and each environment is solved by a single agent.
Adapt Exploration Between Reward And Disagreement
- Actively collect data to improve world models by optimizing exploration objectives that trade reward vs. model disagreement, and adapt that tradeoff online.
- Jack's
