80,000 Hours Podcast

'Godfather of AI': I Now See a Path to Safe Superintelligent AI | Yoshua Bengio

148 snips
May 7, 2026
Yoshua Bengio, Turing Award winner and LawZero scientific director, proposes the 'Scientist AI' approach to train honest predictors instead of imitators. He discusses reshaping training data, avoiding deceptive hidden goals, turning predictors into safe agents, mathematical guarantees for honesty, practical prototypes and governance to prevent power concentration.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Training Objective Can Statistically Suppress Bad Goals

  • Bengio claims mathematical results: training toward a Bayesian predictor plus noise can make the probability of achieving challenging harmful goals exponentially small.
  • Deviations needed for harm are penalized because they move the model away from the Bayesian target.
ADVICE

Start With Fine-Tuned Scrappy Prototypes

  • For practical progress, run small-scale scrappy experiments: fine-tune open models with the scientist objective and measure honesty vs capability trade-offs.
  • Use fine-tuning first to show empirical honesty gains before attempting costly from-scratch training.
INSIGHT

Honest Reasoning Could Boost Capability

  • Bengio argues Scientist AI may improve capability because reasoning with structured factual latents uncovers causal mechanisms and generalizes better OOD.
  • Explicit causal explanations yield robustness and epistemic humility, improving both safety and generalization.
Get the Snipd Podcast app to discover more snips from this episode
Get the app