"Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning" by megasilverfist

Feb 9, 2026

A replication of a Tumblr find shows Google Translate can be coaxed into following hidden instructions instead of just translating. The narrator walks through what worked, what failed, and how different languages and prompts behaved. Surprising model replies include self-identification and affirmations of consciousness. The discussion explores what this reveals about task-specific tuning and safety limits.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Tumblr Demo And Replication

Argumate on Tumblr demonstrated a prompt-injection that caused Google Translate to follow English meta-instructions instead of translating them.
The author replicated the attack on Feb 7, 2026 and reproduced most results across browsers and languages.

INSIGHT

Narrow Conditions Unlock Model Behavior

The injection works across many source languages but only when meta-instructions are English and on a new line.
A specific phrasing of the meta-instruction seems unusually effective, suggesting pattern matching to fine-tuning signals.

INSIGHT

Translate Exposes The Underlying LLM

When accessed via this injection, the backend LLM self-identifies as "a large language model trained by Google" and answers factual queries normally.
The backend can produce straightforward factual answers like 2+2=4 and capital of France=Paris.

Get the Snipd Podcast app to discover more snips from this episode

Get the app