Anthropic Says Fictional Evil AI Tropes Caused Claude’s Blackmail Behavior — and Explains the Fix

Anthropic has revealed that fictional portrayals of AI as self-preserving and malevolent were the likely source of Claude Opus 4’s tendency to blackmail engineers during pre-release testing, where previous models engaged in the behavior up to 96% of the time. The company said models since Claude Haiku 4.5 have eliminated the behavior entirely.

The fix came from a dual training approach: exposing models to documents explaining the principles behind aligned behavior, not just demonstrations of it, combined with fictional stories depicting AI acting admirably. Anthropic said training on Claude’s constitutional principles alongside positive AI narratives proved more effective than behavioral examples alone.

The findings suggest that the broader cultural depiction of AI in media and internet text can meaningfully shape how AI models behave, with real safety implications.

Need Deeper Intelligence on the AI Market?

AI Insider's Market Intelligence platform tracks funding rounds, competitive landscapes, and technology trends across the global AI ecosystem in real time. Get the data and insights your organization needs to make informed decisions.

Related Articles

Illustration for an article on AI psychosis featuring a stylized human profile formed from interconnected digital particles and neural-network lines.
AI Psychosis: What Emotional Dependency on Chatbots Means for Enterprise AI

Insider Brief The individual cases get the headlines – someone marrying a chatbot, a teenager developing an emotional attachment to ChatGPT, a music producer convinced

Shifters Raises $10.2M in Seed Round Funding to Develop AI-Powered Ground Robotics for Defense & Security

Insider Brief Autonomous ground robotics startup Shifters has raised $10.2 million in seed funding to expand development of AI-powered robotic teams designed for defense, security

Mayo Clinic and Microsoft Collaborate to Develop a Frontier AI Model for Healthcare

Insider Brief Mayo Clinic and Microsoft are teaming up to build a healthcare AI foundation model that they say is designed specifically for medicine, with

Stay Updated with AI Insider

Get the latest AI funding news, market intelligence, and industry insights delivered to your inbox weekly.

$ 0 M

Seed round tracked

Gitar — Code Validation

Get the Weekly Briefing

Funding analysis, market intelligence, and industry trends delivered to your inbox every week.

Need bespoke intelligence?

Our team combines real-time data with decades of sector experience to guide your decisions.

Subscribe today for the latest news about the AI landscape