Building reliable and proactive agentic systems at scale: how Shopify’s reflexive AI culture was instrumental in their development of Sidekick w/ Andrew McNamara #258

May 12, 2026

Andrew McNamara, Director of Applied Machine Learning at Shopify, leads the team behind Shopify Sidekick and builds production-scale AI assistants. He talks about Shopify’s prototype-first culture and how interns shape AI adoption. The conversation covers Sidekick’s merchant-driven vision, building ground-truth evals, subagent specializations, proactive features like Sidekick Pulse, and reliability and latency strategies.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

Merchant Survey Drove Sidekick's AI Co-founder Vision

Merchant surveys showed access to a business-savvy friend correlated strongly with store survival, motivating Sidekick as an AI co-founder.
Andrew recounts that merchants with mentorship-like access were far more likely to continue and succeed, inspiring Sidekick's vision.

ADVICE

Prefer Simple Orchestration Over Complex Multiagent Systems

Favor simple orchestration: use one model to orchestrate tool calls rather than complex multi-agent systems.
Andrew recommends iterating with parallel prototypes to empirically find that less-complex design wins in practice.

ADVICE

Make Evals Central With A Growing Ground Truth Set

Build evolving ground truth sets (GTX) and an AI judge calibrated to them to evaluate conversational agents at scale.
Andrew says product specs become labelled conversation rubrics; the judge must match human experts to enable scalable evals and training.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Andrew McNamara, Director of Applied Machine Learning @ Shopify, joins the ELC podcast to share insights on building agentic platforms at scale, like Sidekick, that must keep reliability for its users at the forefront. Andrew describes the building philosophy behind Shopify and what it means to cultivate a culture of prototype-first while prioritizing hiring early-stage talent. We cover Sidekick’s development journey and how user feedback impacted its product vision, why evaluation is so important for determining ground truth sets, and the benefit of user-driven use cases. Andrew also dissects how they went about making product design decisions, such as building proactive agents and identifying subagent specializations.

ABOUT ANDREW MCNAMARA

Andrew McNamara is Director of Applied Machine Learning at Shopify, where he leads the team behind Shopify Sidekick, an AI co-founder that gives merchants access to the e-commerce expertise they need to run and grow their business. With 16 years of experience building AI assistants, he brings a rare combination of applied research depth and production-scale thinking to some of the hardest problems in AI: getting systems to work reliably for people who depend on them. Andrew's work pushes Shopify to measure AI quality by whether it achieves what the user set out to do, a core standard in building AI that merchants trust. Outside Shopify, he runs Setting North, a small Canadian maple syrup brand built on the same platform he helps make for everyone else.

Unblocked: The context engine your coding agents are missing.

Give your coding agents the context your best engineers have.

Your agents can read code, but they don’t know how your team works. Rules and MCPs give access to information but not understanding. That’s why you still have to tell them where to look and what to look for.

Unblocked gives your agents the history, conventions, and decisions behind your code so they generate mergeable output without the back and forth. It automatically surfaces the right context for every task, so agents stay on track without the set up tax or the correction loops.

getunblocked.com/elc

SHOW NOTES:

How Shopify utilizes reflexive AI & Andrew’s building philosophy (2:38)
Developing a prototype-first company culture (5:07)
Andrew’s reflections on building AI-enabled projects like Sidekick at scale (7:25)
Translating customer surveys into Sidekick’s product vision (9:34)
Key inflection points while scaling out Sidekick (11:23)
Strategies for evaluation / building a ground truth set (13:26)
Analyzing the good & bad within ground truth sets (15:27)
Shopify’s system openness model to drive user-discovered use cases (17:47)
How subagents fit into the Sidekick’s model (19:55)
Prioritization conversations around subagent specializations (23:06)
Designing an agent with high-impact prompt optimization (27:22)
Considerations for building highly reliable systems (29:40)
Andrew’s perspective on latency (31:24)
Rapid fire questions (33:49)

LINKS AND RESOURCES

Cradle - a New York Times best-selling series from Will Wight following a character's growth as he goes from one of the weakest users of his world's magic to among the strongest. The series features an original magic system inspired by Chinese cultivation and martial arts novels, with a heavy emphasis on anime-style super-powered battles.

This episode wouldn’t have been possible without the help of our incredible production team:

Patrick Gallagher - Producer & Co-Host

Jerry Li - Co-Host

Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/

Dan Overheim - Audio Engineer, Dan’s also an avid 3D printer - https://www.bnd3d.com/

Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.