Semi Doped

Meta's Inference Accelerator & Applied Optoelectronics (AAOI)

19 snips
Mar 13, 2026
A technical deep dive into Meta's MTIA custom silicon, why chiplets let them iterate every six months, and how inference is shifting toward gen AI. A lively look at agentic AI changing engineering roles and upskilling junior talent. An investigation of Applied Optoelectronics: vertical laser manufacturing, Amazon's $4B warrant, capacity builds, and whether history might repeat for laser suppliers.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Meta's Three Stage Ad Inference Pipeline

  • Meta's ad stack is three-stage: Andromeda for retrieval, Lattice for ranking, and GEM as a large teacher model distilled to tiny student models for fast inference.
  • Retrieval uses multi-vendor hardware (NVIDIA, AMD, MTIA) and relies on huge embedding tables and HBM capacity more than raw FLOPS.
INSIGHT

MTIA Targets Embedding-Heavy Recommendation Workloads

  • MTIA is optimized for Meta's memory-capacity-bound recommendation workloads, favoring HBM and high scale-out bandwidth over peak FLOPS.
  • The MTIA300 is an 800W accelerator with high 200 GB/s scale-out links, matching embedding table access patterns.
INSIGHT

Meta Chooses HBM Throughput Over SRAM Latency Details

  • Meta intentionally emphasized HBM-based designs and omitted SRAM details, signaling a throughput/capacity-first inference strategy.
  • Lyons and Vikram speculated Meta may still acquire or integrate SRAM-focused startups if latency needs arise.
Get the Snipd Podcast app to discover more snips from this episode
Get the app