Meta's Inference Accelerator & Applied Optoelectronics (AAOI)

19 snips

Mar 13, 2026

A technical deep dive into Meta's MTIA custom silicon, why chiplets let them iterate every six months, and how inference is shifting toward gen AI. A lively look at agentic AI changing engineering roles and upskilling junior talent. An investigation of Applied Optoelectronics: vertical laser manufacturing, Amazon's $4B warrant, capacity builds, and whether history might repeat for laser suppliers.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Meta's Three Stage Ad Inference Pipeline

Meta's ad stack is three-stage: Andromeda for retrieval, Lattice for ranking, and GEM as a large teacher model distilled to tiny student models for fast inference.
Retrieval uses multi-vendor hardware (NVIDIA, AMD, MTIA) and relies on huge embedding tables and HBM capacity more than raw FLOPS.

INSIGHT

MTIA Targets Embedding-Heavy Recommendation Workloads

MTIA is optimized for Meta's memory-capacity-bound recommendation workloads, favoring HBM and high scale-out bandwidth over peak FLOPS.
The MTIA300 is an 800W accelerator with high 200 GB/s scale-out links, matching embedding table access patterns.

INSIGHT

Meta Chooses HBM Throughput Over SRAM Latency Details

Meta intentionally emphasized HBM-based designs and omitted SRAM details, signaling a throughput/capacity-first inference strategy.
Lyons and Vikram speculated Meta may still acquire or integrate SRAM-focused startups if latency needs arise.

Get the Snipd Podcast app to discover more snips from this episode