Controlling AI Models from the Inside

117 snips

Jan 20, 2026

Alizishaan Khatri, founder of Wrynx and AI safety expert from Meta and Roblox, dives into the future of AI safety. He discusses the shortcomings of traditional guardrails and advocates for innovative model-native safety solutions. Alizishaan highlights the necessity of interpretability for internal behavior control and contrasts it with external filters. He shares insights on customizing safety measures across industries while aiming for a streamlined approach to safety without retraining models, ultimately envisioning a robust, adaptable safety layer for AI.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

From Meta And Roblox To RINX

Alizishaan spent years building safety infra at Meta and systems against fraud at Roblox.
Those experiences revealed that safety models themselves can be abused, which led him to found RINX.

INSIGHT

Black-Box Guardrails Are Insufficient

Treating models as black boxes leaves major safety blind spots that prompt/response filters cannot catch.
Jailbreaks and adversarial inputs exploit internal flows you can't see from inputs and outputs alone.

ADVICE

Begin With Risk Categories

Start by enumerating general undesirables and your use-case specific risks before choosing mitigations.
Then design detection and mitigation strategies tailored to those identified categories.

Get the Snipd Podcast app to discover more snips from this episode

Get the app