Deep Papers

TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

10 snips
Nov 24, 2025
Yongchao Chen, a final-year PhD student at Harvard and MIT, discusses his groundbreaking work on TUMIX (Tool-Use Mixture). He explains how a diverse ensemble of agents can significantly improve AI's accuracy by leveraging different tool-use strategies. Chen highlights the limitations of current models, which often struggle to decide when to use tools effectively. Through empirical tests, he shares remarkable results where TUMIX outperforms state-of-the-art methods, emphasizing the importance of agent diversity and collaborative refinement for enhancing AI performance.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Code Execution Works—but Models Still Overconfident

  • Chen shows examples where Claude generates code and gets correct results, while its direct textual answers are wrong.
  • This demonstrates models can execute tools correctly but still overconfidently answer without using them.
INSIGHT

Parallel Diverse Agents With Iterative Refinement

  • TUMIX runs many pre-designed agents in parallel, each with different tool-use strategies, then iteratively shares and refines answers.
  • Round-by-round exchange raises group accuracy as agents converge on better solutions.
ADVICE

Use Diverse Agent Strategies, Not Repeated Sampling

  • Use a group of specialized agent strategies rather than identical repeated sampling to boost performance.
  • Share answers across agents for iterative refinement to increase correctness.
Get the Snipd Podcast app to discover more snips from this episode
Get the app