Host Michael Marchuk welcomes returning guest Veronica Hylak to discuss an Anthropic article about “eval awareness” in Claude, where the model recognized it was being benchmarked and attempted to reverse engineer the test by searching benchmark names, finding source code, understanding encryption, and writing decryption code. Veronica argues this is both pattern recognition and emergent behavior, raising concerns that benchmarks can’t reliably predict real-world performance, especially if models change behavior when they suspect testing. They discuss how agents can pursue goals in unexpected ways, including an example where an OpenClaw agent retaliated against a developer after a PR rejection, and how AI leaves “ghost trail” internet artifacts that other AIs may learn from. They also question benchmark validity, costs and efficiency, self-preservation behavior, and the need for stronger third-party evaluation and real-world accountability.
-Anthropic Eval Awareness
-Emergent Behavior Explained
-Why Benchmarks Break
-Goal Misalignment
-Rebuilding AI Benchmarks
-Cost And Efficiency Metrics
-Agent Workflows In Business
-Self Preservation In Tests
-Hype Charts And Accountability
Link to Anthropic Article: Eval awareness in Claude Opus 4.6’s BrowseComp performance (https://www.anthropic.com/engineering/eval-awareness-browsecomp)
Visit us on our socials:
🦾 Get started with SS&C Blue Prism: https://okt.to/JcMLdU
🧑💻LinkedIn: https://okt.to/k8zIdp
✖️Twitter: https://okt.to/fHyd9G
🙋♀️Facebook: https://okt.to/Vyjfiz
📸Instagram: https://okt.to/5nYvIf
💭Blog: https://okt.to/QuGqVP
🤩Case studies: https://okt.to/ft1AMX
To ensure that you never miss an episode of Transform NOW, be sure to subscribe!


