
TechCrunch Industry News Are AI agents ready for the workplace? A new benchmark raises doubts.; plus, Ring is adding a new content verification feature
Jan 23, 2026
New research reveals that leading AI models struggle with complex white-collar tasks like consulting and law, with many failing to meet essential benchmarks. The top performers, including Gemini 3 Flash and GPT-5.2, achieve only around 24% accuracy. Meanwhile, Ring is introducing a new feature that will detect any edits made to shared videos, enhancing transparency for users. However, limitations like encryption issues highlight the challenges of this verification method.
AI Snips
Chapters
Transcript
Episode notes
AI Struggles With Real Knowledge Work
- Leading AI models fail most real white-collar tasks and often return wrong or no answers.
- Multi-domain reasoning across Slack, Drive, and tools remains the biggest technical barrier.
Narrow Benchmarks Reveal Automation Limits
- The Apex Agents benchmark tests sustained, profession-specific tasks rather than broad general knowledge.
- That narrower, harder evaluation better reflects whether jobs like law and banking can be automated.
A Law Question That Stumps Models
- One law question involved whether brief EU-to-US log exports complied with Article 49 and required deep policy and legal analysis.
- The correct answer required assessing company policy plus relevant EU privacy laws, showing real-world complexity.
