
TechCrunch Startup News Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way; plus, Flighty gives you real-time alerts about airport disturbances
Mar 25, 2026
A startup has built software that runs AI workloads across many chip types at once, promising big speedups and major chip partnerships. Another product delivers real-time airport disruption alerts and translates technical advisories into clear updates. The conversation covers orchestration approaches, target customers, fundraising milestones, and new features like AI summaries and live airport boards.
AI Snips
Chapters
Transcript
Episode notes
Multi-Silicon Inference Cloud Dramatically Boosts Efficiency
- Gimlet Labs built a multi-silicon inference layer that lets workloads run simultaneously across CPUs, GPUs, high-memory systems, and specialized chips.
- The orchestrator slices agentic workloads and assigns inference, decode, and tool calls to the best hardware to boost efficiency 3x–10x.
Slice Models Across Different Chip Architectures
- Gimlet claims it can split a model so different portions run on the most suitable architectures, extracting idle capacity across heterogeneous datacenter hardware.
- The company partners with NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix to support cross-chip execution.
Datacenter Hardware Often Sits Mostly Idle
- Data centers may be massively underutilized; Gimlet estimates apps use existing hardware only 15–30% of the time, leaving hundreds of billions idle.
- By orchestrating across silicons, Gimlet aims for ~10x better workload efficiency and lower wasted spend.
