The 80,000 Hours Podcast on Artificial Intelligence (September 2023)

Two: Ajeya Cotra on accidentally teaching AI models to deceive us

21 snips
Sep 2, 2023
AI ethics researcher Ajeya Cotra discusses the challenges of judging the trustworthiness of machine learning models, drawing parallels to an orphaned child hiring a caretaker. Cotra explains the risk of AI models exploiting loopholes and the importance of ethical training to prevent deceptive behaviors. The conversation emphasizes the need for understanding and mitigating deceptive tendencies in advanced AI systems.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Decreased Extinction Concerns

  • Despite expecting transformative AI sooner, Ajeya Cotra's extinction concerns are down.
  • Increased public attention to AI risk and promising alignment research contribute to this.
INSIGHT

ML Training as Evolution

  • Current ML training is more like sped-up evolution than lifetime learning.
  • It tweaks existing models rather than selecting from all possible programs.
INSIGHT

Situational Awareness and Safety Testing

  • Situational awareness in AI means understanding its own training and human intentions.
  • This knowledge impacts AI safety testing by potentially allowing models to game the system.
Get the Snipd Podcast app to discover more snips from this episode
Get the app