BlueDot Narrated

Can We Scale Human Feedback for Complex AI Tasks?

Jan 4, 2025

They explore why simple human feedback breaks down on complex AI tasks. They outline scalable oversight techniques like task decomposition, recursive reward modelling, constitutional approaches, and debate frameworks. Experiments on weak-to-strong generalisation and potential failure modes are also covered.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Human Feedback Breaks Down On Complex Tasks

RLHF works well for many tasks but fails when humans can't reliably judge complex outputs at scale.
Problems like deception and hallucinations arise because humans may be briefly fooled by plausible outputs that aren't actually correct.

INSIGHT

Summaries Via Hierarchical Decomposition

Task decomposition breaks hard tasks into subtasks that are easier for humans to evaluate accurately.
Example: summarize each page, then each chapter, then the whole book so humans needn't reread the entire book to verify accuracy.

INSIGHT

Iterated Amplification And Distillation

Iterated amplification (IA) calls multiple copies of a model or uses chain-of-thought to amplify capability on subtasks.
IDA adds distillation so the amplified result becomes a single-step training example for the base model.

Get the Snipd Podcast app to discover more snips from this episode