Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way; plus, Flighty gives you real-time alerts about airport disturbances

Mar 25, 2026

A startup has built software that runs AI workloads across many chip types at once, promising big speedups and major chip partnerships. Another product delivers real-time airport disruption alerts and translates technical advisories into clear updates. The conversation covers orchestration approaches, target customers, fundraising milestones, and new features like AI summaries and live airport boards.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Multi-Silicon Inference Cloud Dramatically Boosts Efficiency

Gimlet Labs built a multi-silicon inference layer that lets workloads run simultaneously across CPUs, GPUs, high-memory systems, and specialized chips.
The orchestrator slices agentic workloads and assigns inference, decode, and tool calls to the best hardware to boost efficiency 3x–10x.

INSIGHT

Slice Models Across Different Chip Architectures

Gimlet claims it can split a model so different portions run on the most suitable architectures, extracting idle capacity across heterogeneous datacenter hardware.
The company partners with NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix to support cross-chip execution.

INSIGHT

Datacenter Hardware Often Sits Mostly Idle

Data centers may be massively underutilized; Gimlet estimates apps use existing hardware only 15–30% of the time, leaving hundreds of billions idle.
By orchestrating across silicons, Gimlet aims for ~10x better workload efficiency and lower wasted spend.

Get the Snipd Podcast app to discover more snips from this episode

Get the app