Generally Intelligent

Jack Parker-Holder, DeepMind: Open-endedness, evolving agents and environments, online adaptation, and offline learning

16 snips

Dec 6, 2022

Jack Parker-Holder

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Action Vectors Fail As General Behavior Metrics

Behavior-space measures built from raw actions can be misleading because many actions are redundant; domain-specific behavior encodings or payoff-based diversity are usually required.
Jack found humanoid policies looked similar despite DPP diversity because differing arm motions didn't change task-relevant leg behavior.

INSIGHT

Measure Diversity By Who Solves Which Environment

Payoff or response diversity (which agent solves which environment) reveals meaningful behavioural differences and avoids superficial action variance.
Jack highlights MCC-like objectives where each agent uniquely solves some environments and each environment is solved by a single agent.

ADVICE

Adapt Exploration Between Reward And Disagreement

Actively collect data to improve world models by optimizing exploration objectives that trade reward vs. model disagreement, and adapt that tradeoff online.
Jack's

Get the Snipd Podcast app to discover more snips from this episode