Interconnects

Interview: Ant Group's open model ambitions

52 snips
Nov 12, 2025
Richard Bian, Product & Growth Lead at Ant Ling, shares insights on Ant Group’s vision for open language models. He discusses strategic openness, aiming to accelerate learning and avoid mistakes. Alongside algorithm engineer Chen Liang, they dissect the challenges of training stability and FP8 optimization. The conversation touches on the rapidly evolving Chinese AI ecosystem and the influence of DeepSeek on their mission. They explore balancing model sizes and the innovations behind their unique training methods, all while emphasizing a collaborative approach to AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

MOE Hyperparameters Follow Training Flow

  • Ant implemented a scaling law framework where non-embedding training flows drive experiments and MOE hyperparameters are less sensitive than training flow.
  • They found activation ratio is critical and lowering it consistently improves MOE performance.
ADVICE

Optimize FP8 By Fusing Quant Ops

  • When using FP8, profile quantize/dequantize hotspots and fuse operations to avoid MFU loss from repeated conversions.
  • Fuse gating and quantization where possible and batch expert operations to recover throughput.
ADVICE

Use QK-Norm Ahead Of Rotary For FP8

  • Add QK-norm before rotary embeddings when training at low precision to avoid underflow and amplified quantization error.
  • Monitor intermediate quantization error and gradients, not just loss, to spot numerical issues early.
Get the Snipd Podcast app to discover more snips from this episode
Get the app