
Constitutional AI Harmlessness from AI Feedback
BlueDot Narrated
00:00
Future directions and robustness goals
Ideas for steering style, iterated AI supervision, and scaling automated red-teaming.
Play episode from 58:16
Transcript

Ideas for steering style, iterated AI supervision, and scaling automated red-teaming.