DOP 341: AI Widened the Highway but Nobody Rebuilt the Bridge

Mar 11, 2026

Trevor Stuart, co-founder of Split.io and head of Feature Management & Experimentation at Harness, has deep experience with feature flags and experimentation. He discusses AI-written code creating a six-lane highway into a two-lane bridge of reviews and delivery. Teams are embedding AI configs into flags and A/B testing agents in production. Culture, flag lifecycle, and running revenue-driving experiments receive special focus.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Kiosk Experiments Generated Millions

A major fast food chain used experiments on kiosk UI wording and placement to generate several million dollars of incremental sales.
Trevor uses this to show execs accept data when experiments clearly move revenue.

INSIGHT

Fear Of Failure Kills Experimentation

A core cultural barrier to experimentation is fear of admitting failure, which stops teams after one failed test.
Trevor says teams that quit after a first failed experiment lose the learning loop and stop running the next 30 experiments.

ADVICE

Run Prompt Tests Inside Feature Flag Configs

Put prompts, token limits and model settings into feature flag configs so you can A/B test agents in production safely.
Trevor reports teams run different prompt variants to 5% of traffic to compare hallucination and business impact.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

#341: Nobody's arguing about whether you need feature flags in 2026. That debate ended years ago. But the code flowing through those flags? That's a different story. AI is writing more of it than ever, review times are climbing, and delivery throughput has actually declined. Trevor Stuart, co-founder of Split.io and now running Feature Management & Experimentation at Harness, calls it the six-lane highway ending in a two-lane bridge.

The bottleneck didn't disappear. It moved. Coding got faster, but everything downstream -- reviews, security scans, delivery pipelines -- stayed the same width. Viktor points out this is the exact same pattern from the early agile days: his team shipped every two weeks, but testing still took six months. Different era, same structural problem.

Feature flags are part of the fix, but not the way most people use them. Teams are now stuffing prompts, token limits, and temperature settings inside feature flag configurations and running A/B tests on AI agents in production. That's a long way from changing button colors on a marketing page, which is where experimentation started 15 years ago.

The culture problem is harder than the tooling problem. Trevor has watched teams run one experiment, see it fail, and quit experimenting entirely. The fear of admitting failure kills more experimentation programs than bad data ever will. Meanwhile, the companies getting real results -- a fast food chain generating millions from kiosk experiments, a global bank driving hundreds of millions in customer acquisition -- are the ones treating experimentation as a permanent operating model, not a one-off project.

The conversation also covers Trevor's path from co-founding Split to running it inside Harness post-acquisition. He stayed -- which doesn't happen as often as you'd think. Harness runs what he calls a 'startup within a startup' model, and he breaks down what that actually looks like from the inside, what was hardest to let go of, and why finding your 'why' matters more than any exit.

Trevor's contact information:

LinkedIn: https://www.linkedin.com/in/trevorbstuart/

YouTube channel:

https://youtube.com/devopsparadox

Review the podcast on Apple Podcasts:

https://www.devopsparadox.com/review-podcast/

Slack:

https://www.devopsparadox.com/slack/

Connect with us at:

https://www.devopsparadox.com/contact/