
Cursor Training superhuman coding models at Cursor
4 snips
May 30, 2025 Guest
Roundtable Participant (Speaker 3)
Guest
Roundtable Participant (Speaker 2)
Guest
Roundtable Participant (Speaker 1)
Guest
Roundtable Participant (Speaker 0)
In a lively roundtable, technical experts from Cursor delve into the intricacies of training superhuman AI coding agents. They discuss the unique challenges of coding's large action space and the complexity of ground-truth signals. Highlights include debates on the effectiveness of reinforcement learning rewards, the importance of real-world developer feedback, and advancements in hardware to support longer contexts. The conversation reveals fascinating insights into potential future developments in AI-assisted coding and how these innovations could transform the field.
AI Snips
Chapters
Transcript
Episode notes
Benchmark Heuristics For Memory Storage
- Evaluate memory heuristics with offline benchmarks rather than full RL credit assignment to find practical storage rules.
- Use a modest set of labeled tasks to compare heuristics before attempting learned memory storage.
Real User Signals Beat Synthetic Rewards
- Real-world human signals (accepted edits, model selector churn) are more valuable than synthetic test rewards for product fit.
- Using multiple model attempts and selecting the best closes the gap between pass@1 and pass@K.
Retrain Reward Models Often On Real Choices
- Retrain reward models frequently on fresh user-choice data and deploy quickly to align optimization with real user preference.
- Prefer retraining reward models over relying solely on static proxies like thumbs up/down.
