Interconnects

Arcee AI goes all-in on open models built in the U.S.

29 snips
Jan 27, 2026
Lucas Atkins, CTO and pretraining/architecture lead at Arcee AI, talks about building Trinity Large and the six-month sprint to train a 400B total, 13B active MoE. He covers model scaling, Muon optimizer choices, stability and expert balancing, dataset and compute planning on B300 hardware, and deployment, licensing, and business tradeoffs for open U.S.-built models.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ADVICE

Align Runs With Cluster Availability

  • Optimize for compute availability: align model timelines with when clusters become accessible to avoid delays.
  • Use smaller clusters for experiments and larger ones only when schedules and support are certain.
ADVICE

Prevent Expert Collapse Early

  • For MoEs use auxiliary balancing and sequence losses to prevent expert collapse; adapt DeepSeq's AuxFree loss with tweaks.
  • Monitor expert metrics early and intervene around signs of collapse near ~1T tokens.
INSIGHT

RL Scale Outweighs SFT Complexity

  • Post-training has become dominated by RL scale rather than SFT sophistication; they shifted complexity into RL pipelines.
  • Simpler SFT plus heavy RL and many environments yields bigger performance gains.
Get the Snipd Podcast app to discover more snips from this episode
Get the app