
Constitutional AI Harmlessness from AI Feedback
BlueDot Narrated
00:00
RLAIF: reinforcement from AI feedback
Method for generating model-based harmlessness labels, preference models, and RL training pipeline.
Play episode from 37:10
Transcript

Method for generating model-based harmlessness labels, preference models, and RL training pipeline.