=Coffee

8 snips

Feb 16, 2026

Kasimir Schulz, lead security researcher at HiddenLayer who studies adversarial attacks on AI, explains Echogram — tokens that flip safety classifiers without changing meaning. He talks about how guardrail classifiers work, how flip tokens are found and transfer across models, real examples like "coffee," risks to agentic systems and admin access, and practical mitigations to harden defenses.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Operation Mincemeat Analogy

Jordan uses Operation Mincemeat as an analogy: small, believable artifacts made fake plans seem authentic.
That historical social-engineering example illustrates why tiny benign tokens can let harmful prompts through.

ADVICE

Put A Classifier In Front Of LLMs

Use an external guardrail classifier in front of LLMs to reduce prompt injection and PII leakage.
Avoid trusting the base LLM alone because it is trained to follow user instructions by default.

ADVICE

Mine Flip Tokens By Brute Force

You can mine flip tokens by brute-forcing a model's vocab and then filtering candidates on larger datasets.
Prioritize tokens with high flip rates and combine weaker tokens to boost effectiveness.

Get the Snipd Podcast app to discover more snips from this episode