The disputed $400M funding claim

Elric walks through conflicting signals about a reported $400M Replit round and why manual verification matters.

Play episode from 05:37

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Want the prompt I used for this test? And my AI Prompt Library with 30+ outbound prompts?

⁠Upgrade now in my newsletter here.

I tested seven AI models on the same account research prompt, 12 specific instructions, one target company (Replit), one buyer lens (TrackRec). This is my March 2026 benchmark.

The models: Perplexity Sonar, GPT 5.2 Thinking, Grok 4.2 Beta, Grok 4, Claude Opus 4.6, Claygent (Argon), and Gemini 3 Pro. I scored every model on six weighted criteria, tracked which instructions each model actually completed, classified why they missed what they missed, and manually verified every disputed claim.

Agenda:

- Why I expanded from 3 scoring criteria to 6 — and how adding Business Relevance changed the rankings

- What instruction completion reveals that scores alone don't (Perplexity: 10/12, Gemini: 1/12)

- The difference between hallucinations and false claims — and why it matters for automation at scale

- Why four models found September funding and stopped looking (the persistence failure pattern)

- The $400M funding round that may or may not be real — REPORTED vs VERIFIED as a new verification category

- Which model to use for high-value accounts vs volume enrichment in Clay

- Web app vs API vs Clay: why your results will be different and what I'm testing in the next benchmark

Referenced:

- TrackRec: https://www.trackrec.co

- Replit: https://replit.com

- Perplexity: https://www.perplexity.ai

- Clay: https://www.clay.com

- RepVue: https://www.repvue.com

- The account research prompt: Available for Outbound Kitchen paid members

Who I am? Elric Legloire, founder of Outbound Kitchen.

When you're ready

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠👨‍🍳 Want to work with me? Send me a DM⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

---

Connect with me

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠📌 Connect on LinkedIn⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠📹 Subscribe on YouTube ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠🐦 Connect on X ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Chapters:(0:00) - Why I keep benchmarking AI models
(1:45) - The test setup: TrackRec researching Replit
(3:00) - What changed from the last test(6 criteria, instruction tracking)
(3:30) - The new rankings
(4:05) - Perplexity: VP of SDR, podcast, RepVue miss
(5:00) - GPT 5.2: zero false claims, Glassdoor depth
(5:30) - The $400M funding round — is it real?
(7:00) - Grok 4.2: 56 seconds, best RepVue data
(8:00) - Bottom four models (quick summary)
(8:55) - Verification: hallucinations vs false claims
(10:05) - Which models I recommend
(10:45) - Web app vs Clay availability
(11:30) - What's next

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books