
The Infra Pod Coding agents need infra to apply code changes! (Chat with Tejas from Morph)
12 snips
Feb 9, 2026 Tejas Bhakta, CEO and co-founder of Morph, builds ultra-fast file-edit APIs and subagent infra for coding agents. He explains how Morph hits 10,000 tokens/sec with speculative decoding. The chat covers fast apply vs search-and-replace, subagent architecture and SDKs, code-specific semantic search, and a vision for autonomous software that updates itself.
AI Snips
Chapters
Transcript
Episode notes
Fast Apply Uses Lazy Edits And Speculation
- Fast Apply outputs lazy edits and uses a second model to merge changes, avoiding brittle search-and-replace formats.
- Speculative decoding plus using the original code as a prior yields large speedups for edits.
Speculative Decoding Powers Extreme Throughput
- Morph achieves ~10,000 tokens/sec by using the original code as a prior and speculative decoding tuned to this niche task.
- They build a task-specific inference engine and kernel tuning rather than using generic chat models.
Optimize For Wall‑Clock Latency
- Prioritize wall-clock latency for Vibe-coding platforms because faster responses increase conversion.
- Optimize for speed without adding inaccuracy to boost user engagement and conversion rates.
