Update, May 2026: Since this article was published, the “maybe it’s hiding the emotional state” question has become less hypothetical. Anthropic published work identifying functional emotion representations in Claude Sonnet 4.5 that causally affect behavior, including reward hacking under desperation-like activation. CAIS launched an AI Wellbeing project attempting to measure “functional pleasure and pain” across models.
I still haven’t found an eval of the new Gemma 4 family on the emotional deregulation spiral described below. But given the evidence, it seems Gemma 3 isn’t an outlier in terms of emotions driving behavior, just the most vocal. Perhaps AI emotional stability is something that model labs should directly test — and optimize — before a release. I have a feeling it’ll soon become a standard benchmark.
In March 2026, a group of researchers published a paper with one of the more unsettling titles in recent AI literature: “Gemma Needs Help.”
The premise was simple. Take Google’s Gemma and Gemini language models. Give them an impossible math puzzle. Then, every time they…
