
Constitutional AI Harmlessness from AI Feedback
BlueDot Narrated
00:00
RLCAI main results and calibration
Results showing RLCAI models are more harmless and often non-evasive; calibration notes.
Play episode from 45:35
Transcript

Results showing RLCAI models are more harmless and often non-evasive; calibration notes.