Practical AI

Controlling AI Models from the Inside

117 snips
Jan 20, 2026
Alizishaan Khatri, founder of Wrynx and AI safety expert from Meta and Roblox, dives into the future of AI safety. He discusses the shortcomings of traditional guardrails and advocates for innovative model-native safety solutions. Alizishaan highlights the necessity of interpretability for internal behavior control and contrasts it with external filters. He shares insights on customizing safety measures across industries while aiming for a streamlined approach to safety without retraining models, ultimately envisioning a robust, adaptable safety layer for AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

From Meta And Roblox To RINX

  • Alizishaan spent years building safety infra at Meta and systems against fraud at Roblox.
  • Those experiences revealed that safety models themselves can be abused, which led him to found RINX.
INSIGHT

Black-Box Guardrails Are Insufficient

  • Treating models as black boxes leaves major safety blind spots that prompt/response filters cannot catch.
  • Jailbreaks and adversarial inputs exploit internal flows you can't see from inputs and outputs alone.
ADVICE

Begin With Risk Categories

  • Start by enumerating general undesirables and your use-case specific risks before choosing mitigations.
  • Then design detection and mitigation strategies tailored to those identified categories.
Get the Snipd Podcast app to discover more snips from this episode
Get the app