80,000 Hours Podcast

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

75 snips
Aug 7, 2023
Jan Leike, Head of Alignment at OpenAI and leader of the Superalignment project, discusses the ambitious goal of safely developing superintelligent AI within four years. He addresses the challenges of aligning AI with human values and the importance of Reinforcement Learning from Human Feedback (RLHF). Leike expresses guarded optimism about finding solutions to steer AI safely, emphasizing collaboration and innovative approaches in tackling these complex issues. The conversation also highlights recruitment efforts to build a team for this critical initiative.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Target Intelligence Level

  • OpenAI aims to align models roughly as smart as top human alignment researchers.
  • These models might excel in some areas but lag in others, like arithmetic.
INSIGHT

Avoiding Overfitting in Alignment

  • Training and validation methods for alignment must be kept separate to avoid overfitting.
  • This prevents models from learning to pass tests without achieving true alignment.
INSIGHT

Four-Year Goal

  • OpenAI's four-year goal is ambitious but potentially achievable, driven by the rapid pace of AI development.
  • They aim to be prepared for alignment challenges before they become critical.
Get the Snipd Podcast app to discover more snips from this episode
Get the app