
Practical AI Controlling AI Models from the Inside
117 snips
Jan 20, 2026 Alizishaan Khatri, founder of Wrynx and AI safety expert from Meta and Roblox, dives into the future of AI safety. He discusses the shortcomings of traditional guardrails and advocates for innovative model-native safety solutions. Alizishaan highlights the necessity of interpretability for internal behavior control and contrasts it with external filters. He shares insights on customizing safety measures across industries while aiming for a streamlined approach to safety without retraining models, ultimately envisioning a robust, adaptable safety layer for AI.
AI Snips
Chapters
Transcript
Episode notes
From Meta And Roblox To RINX
- Alizishaan spent years building safety infra at Meta and systems against fraud at Roblox.
- Those experiences revealed that safety models themselves can be abused, which led him to found RINX.
Black-Box Guardrails Are Insufficient
- Treating models as black boxes leaves major safety blind spots that prompt/response filters cannot catch.
- Jailbreaks and adversarial inputs exploit internal flows you can't see from inputs and outputs alone.
Begin With Risk Categories
- Start by enumerating general undesirables and your use-case specific risks before choosing mitigations.
- Then design detection and mitigation strategies tailored to those identified categories.

