AXRP - the AI X-risk Research Podcast

29 - Science of Deep Learning with Vikrant Varma

Apr 25, 2024
Vikrant Varma discusses challenges with unsupervised knowledge discovery, grokking in neural networks, circuit efficiency, and the role of complexity in deep learning. The conversation delves into the balance between memorization and generalization, exploring neural circuits, implicit priors, optimization, and alignment projects at DeepMind.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Limits of Linear Probes for Model Concepts

  • Linear probes might fail to capture complex or alien model concepts lacking short natural language decode paths.
  • Many model concepts may be encoded nonlinearly and not directly decoded into natural language tokens.
INSIGHT

Grokking Reveals Training Dynamics

  • Grokking describes a sudden shift in neural networks from memorization to generalization after extended training.
  • Understanding grokking can illuminate training dynamics and inductive biases relevant to AI alignment.
INSIGHT

Circuit Model for Grokking Explains Behavior

  • Memorizing and generalizing circuits conceptually compete in networks but overlap heavily in parameters.
  • Circuit interactions are complex; models may morph circuits rather than scale independently.
Get the Snipd Podcast app to discover more snips from this episode
Get the app