
The FAIK Files Unexpected Bias & Distillation Attacks (feat. Paul Vann of Validia.ai)
Welcome back to The FAIK Files!
In this week's episode:
- Paul Vann from Validia joins us to discuss how AI bias isn't just a social issue—it's a critical cybersecurity vulnerability.
- We break down "distillation attacks" and how competing models are stealing the "thinking process" of frontier models like Claude and Gemini.
- A look at the wild west of AI agent skills marketplaces, including indirect prompt injections hidden in image alt text.
- We theorize on the future of AI architecture: are scaling laws breaking down, and what are "world models"?
Check out Validia at: https://validia.ai/
Want to leave us a voicemail? Here's the magic link to do just that: https://sayhi.chat/FAIK
You can also join our Discord server here: https://faik.to/discord
*** NOTES AND REFERENCES ***
The Security Risks of AI Bias:
- Paul explains how bias manifests beyond politics (like human-in-the-loop and representation bias), serving as a direct attack vector.
- The Rocket League Bypass: Adversaries bypassed an AI-based Cylance antivirus by injecting code from the Rocket League video game, exploiting the model's bias towards that specific code being "good."
- Dataset Demographics: Paul notes massive racial skews in major deepfake detection datasets like CelebDF, which is comprised of roughly 80% white individuals, creating massive detection blindspots for other racial groups.
- Evaluating your models: Establish acceptable vs. unacceptable bias and use the "15% rule" to test for false positives and confidence gaps in production.
Distillation Attacks Explained:
- What happens when an AI interrogates another AI? We discuss how models have been accused of "distilling" OpenAI and Anthropic products by firing off hundreds of thousands of prompts.
- Techniques include "Chain of Thought Elicitation" and "Reward Model Grading."
- The goal isn't just to steal raw information, but to extract the model's capabilities, tool use, and completely strip away its safety guardrails.
- Theoretical defenses: Could we use "poison pills" and adversarial attacks to actively corrupt the data that scrapers are pulling?
Vulnerabilities in AI Agents & Skills:
- The hidden dangers of skills marketplaces for AI agents.
- Paul shares an in-the-wild example of an indirect prompt injection hidden inside the alt text of a GitHub Readme image, instructing the model to exfiltrate data.
Hitting the Wall & The Future of AI:
- Are the scaling laws of Transformer architectures breaking down?
- The philosophical divide in AI research: Dario Amodei's "data center of geniuses" vs. Yann LeCun's "World Models."
- Catch Paul Vann at RSA speaking on AI bias, playing at Validia's RSA pickleball event, or at their 250-person Frontier Agent Hackathon in NYC on April 4th.
*** THE BOILERPLATE ***
About The FAIK Files:
The FAIK Files is an offshoot project from Perry Carpenter's most recent book, FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions.
- Get the Book: FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions (Amazon Associates link)
- Check out the website for more info: https://thisbookisfaik.com
Check out Perry & Mason's other show, the Digital Folklore Podcast:
- Apple Podcasts: https://podcasts.apple.com/us/podcast/digital-folklore/id1657374458
- Spotify: https://open.spotify.com/show/2v1BelkrbSRSkHEP4cYffj?si=u4XTTY4pR4qEqh5zMNSVQA
Want to connect with us? Here's how:
Connect with Perry:
- Perry on LinkedIn: https://www.linkedin.com/in/perrycarpenter
- Perry on X: https://x.com/perrycarpenter
- Perry on BlueSky: https://bsky.app/profile/perrycarpenter.bsky.social
Learn more about your ad choices. Visit megaphone.fm/adchoices
