Llama2 Overtakes ChatGPT, The AI Cartel & Addictive AI Agents | E25

26 snips

Jul 28, 2023

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Claude's Pride

Claude insists it invented everything, reflecting Anthropic's focus on building pride into their model.
This bias affects its evaluations, placing itself higher when used as the benchmark.

ADVICE

Thwarting Prompt Injection

Use character-based personas and thought guidance to mitigate prompt injection attacks.
Continuously remind the model of its persona and the desired behavior.

INSIGHT

Prompt Injection Defense

Focusing on the consequences of prompt injection is more effective than trying to prevent every attack.
Sanitizing inputs is less effective than controlling the model's actions within a sandboxed environment.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

This week, in Episode 25 of This Day in AI we discuss the motivation behind the Frontier Model Forum (OpenAI, Google, Anthropic, Microsoft) and if Open Source remains the best approach for AI safety and security. We discuss Llama 2 being #2 on the AlpacaEval Leaderboard and its significance to the development of AI. We also discuss the paper on Universal and Transferable Adversarial Attacks on Aligned Language Models and how Mike can't prompt inject his AI girlfriend. We discuss how Mike cloned his friend with an AI Bot and the future implications. And also.. Stackoverflow AI, Stable Diffusion XL 1.0 and technology advances that might be being made by AGI!?

If you like this podcast please consider subscribing and leaving us a review.

CHAPTERS:

00:00 - Chris think Anthropic is a Safety Cult (cold open)
00:29 - Llama2 is Number 2, Ahead of ChatGPT on AlpacaEval Leaderboard
06:12 - Does Llama2 threaten OpenAI and Anthropic?
09:12 - Characters to Thwart Prompt Injection Attacks
17:51 - Debate on Regulation Vs Open Source for Safety, Senate Committe on AI and thoughts on AI Risk
36:28 - Is the Frontier Model Forum a Cartel?
42:05 - Mike's AI Girlfriend, Cloning a Friend and Talking to the Dead with AI
49:43 - Fine Tuning AI to Increase Retention
54:45 - Stack overflow Fights Back! Stack Overflow AI is here!
59:18 - Stable Diffusion XL 1.0
1:03:46 - Could Room Temperature Ambient Superconductors Could be a Sign of AGI!?

SOURCES:
https://tatsu-lab.github.io/alpaca_eval/
https://twitter.com/profoundlyyyy/status/1684333960753565696?s=20
https://medium.com/@daniel_eth/ai-x-risk-at-senate-hearing-7104f371ca0b
https://venturebeat.com/ai/hugging-face-github-and-more-unite-to-defend-open-source-in-eu-ai-legislation/
https://huggingface.co/blog/assets/eu_ai_act_oss/supporting_OS_in_the_AIAct.pdf
https://openai.com/blog/frontier-model-forum
https://www.wired.com/story/metas-open-source-llama-upsets-the-ai-horse-race/
https://twitter.com/OpenAI/status/1684145154628653056
https://llm-attacks.org/zou2023universal.pdf
https://twitter.com/zicokolter/status/1684500097386811393/photo/1
https://futurism.com/experts-ai-girlfriend-apps-men
https://twitter.com/emollick/status/1684623965203910656
https://arxiv.org/pdf/2303.06135.pdf
https://twitter.com/StackOverflow/status/1684530704850243584
https://twitter.com/natfriedman/status/1684303687894839296?s=46&t=uXHUN4Glah4CaV-g2czc6Q
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
https://twitter.com/russelljkaplan/status/1684042014495592448