Training superhuman coding models at Cursor

4 snips

May 30, 2025

Guest

Roundtable Participant (Speaker 3)

Guest

Roundtable Participant (Speaker 2)

Guest

Roundtable Participant (Speaker 1)

Guest

Roundtable Participant (Speaker 0)

In a lively roundtable, technical experts from Cursor delve into the intricacies of training superhuman AI coding agents. They discuss the unique challenges of coding's large action space and the complexity of ground-truth signals. Highlights include debates on the effectiveness of reinforcement learning rewards, the importance of real-world developer feedback, and advancements in hardware to support longer contexts. The conversation reveals fascinating insights into potential future developments in AI-assisted coding and how these innovations could transform the field.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Benchmark Heuristics For Memory Storage

Evaluate memory heuristics with offline benchmarks rather than full RL credit assignment to find practical storage rules.
Use a modest set of labeled tasks to compare heuristics before attempting learned memory storage.

INSIGHT

Real User Signals Beat Synthetic Rewards

Real-world human signals (accepted edits, model selector churn) are more valuable than synthetic test rewards for product fit.
Using multiple model attempts and selecting the best closes the gap between pass@1 and pass@K.

ADVICE

Retrain Reward Models Often On Real Choices

Retrain reward models frequently on fresh user-choice data and deploy quickly to align optimization with real user preference.
Prefer retraining reward models over relying solely on static proxies like thumbs up/down.

Get the Snipd Podcast app to discover more snips from this episode

Get the app