
Security Cryptography Whatever Cryptanalyzing LLMs with Nicholas Carlini
5 snips
Jan 28, 2025 Nicholas Carlini, an AI security researcher specializing in machine learning vulnerabilities, joins the discussion. He delves into the mathematical underpinnings of LLM vulnerabilities, highlighting risks like model poisoning and instruction injection. Carlini explores the parallels between cryptographic attacks and AI model vulnerabilities, emphasizing the importance of robust security frameworks. He also outlines key defense strategies against data extraction and shares insights on the fragility of current AI defenses, urging a critical evaluation of security practices in an evolving digital landscape.
AI Snips
Chapters
Transcript
Episode notes
Extracting Hidden Layers
- Carlini presented an attack at Crypto extracting hidden layers from a simple neural network with ReLU activation.
- While seemingly a toy problem, such attacks remain relevant for application-specific classifiers built on confidential data.
Recovering Model Parameters
- Recovering model parameters means recovering the weights that determine each neuron's connections.
- Each neuron performs a dot product with the previous layer, forming a matrix of weights.
Biases and Weights
- Biases, added after the weighted sum, determine neuron activation and differentiate models.
- Like DES S-boxes, model weights begin randomly and are trained via stochastic methods.
