Benchmarking AI Models
Linear Digressions
00:00
SWE-bench: evaluating real software fixes
Unknown Host introduces SWE-bench, using GitHub issues and unit tests to assess coding and debugging skills.
Play episode from 21:34
Transcript
Unknown Host introduces SWE-bench, using GitHub issues and unit tests to assess coding and debugging skills.