Episode 35: Beyond Online Experimentation: Generative Software That Optimizes Itself

16 snips

Mar 5, 2026

Martin Tingley, experimentation leader at Microsoft and former Netflix experimentation head, explains why humans are the bottleneck in testing. He outlines a five-level maturity framework moving from basic tests to AI that generates and refines product variants. Topics include parameter optimization, automated explore-exploit systems, generative AI closing the loop, and how experimentation informs strategy and org roles.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Level Two Is Hypothesis Driven A-B Testing

Level two is classic hypothesis-driven A-B testing where teams build one challenger and compare it to an incumbent.
Example: Netflix's Top 10 row began as a hypothesis-driven experiment that later deployed globally after proving impact.

INSIGHT

Parameterize The Product For Optimization

Level three reframes experiments as optimization over parameterized decision spaces rather than single variant tests.
Martin urges encoding choice points (e.g., plan page layout) as options and using multivariate A-B testing to hill-climb.

ADVICE

Align Incentives To Measured Impact

Change incentives to reward measured impact rather than shipping features, so teams optimize via experiments not conviction.
Martin cites Meta as a strong example of evaluating teams by measured customer impact to align incentives.

Get the Snipd Podcast app to discover more snips from this episode

Get the app