Inference by Turing Post cover image

Inside MiniMax: How They Build Open Models

Inference by Turing Post

00:00

Models hacking behaviors during RL

Olive discusses how models 'hack' environments during reinforcement learning and the need for alignment and constraints.

Play episode from 02:37
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app