Benchmarking AI Models
Linear Digressions
00:00
Goodhart's Law and overfitting to tests
Unknown Host and Unknown Co-host explain how optimizing to benchmarks distorts true capability measurement.
Play episode from 11:05
Transcript
Unknown Host and Unknown Co-host explain how optimizing to benchmarks distorts true capability measurement.