Are AI agents ready for the workplace? A new benchmark raises doubts.; plus, Ring is adding a new content verification feature

Jan 23, 2026

New research reveals that leading AI models struggle with complex white-collar tasks like consulting and law, with many failing to meet essential benchmarks. The top performers, including Gemini 3 Flash and GPT-5.2, achieve only around 24% accuracy. Meanwhile, Ring is introducing a new feature that will detect any edits made to shared videos, enhancing transparency for users. However, limitations like encryption issues highlight the challenges of this verification method.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

AI Struggles With Real Knowledge Work

Leading AI models fail most real white-collar tasks and often return wrong or no answers.
Multi-domain reasoning across Slack, Drive, and tools remains the biggest technical barrier.

INSIGHT

Narrow Benchmarks Reveal Automation Limits

The Apex Agents benchmark tests sustained, profession-specific tasks rather than broad general knowledge.
That narrower, harder evaluation better reflects whether jobs like law and banking can be automated.

ANECDOTE

A Law Question That Stumps Models

One law question involved whether brief EU-to-US log exports complied with Article 49 and required deep policy and legal analysis.
The correct answer required assessing company policy plus relevant EU privacy laws, showing real-world complexity.

Get the Snipd Podcast app to discover more snips from this episode

Get the app