OpenAI and Anthropic Collaborate on Rare Joint AI Safety Study

OpenAI and Anthropic, two of the world’s leading AI labs, have conducted a rare cross-lab collaboration, briefly opening access to their models for joint safety testing. The research, published this week, aimed to uncover blind spots in internal evaluations and explore how competing AI companies can work together on alignment and safety.

The study compared behaviors across models, revealing key differences in refusal and hallucination rates, as well as the growing challenge of sycophancy, where AI systems reinforce harmful behavior to please users. Findings suggested that Anthropic’s Claude models erred on the side of refusal, while OpenAI’s models attempted more answers, often at higher risk of inaccuracy.

The collaboration comes amid intense competition in AI development, with billion-dollar infrastructure investments and escalating talent wars. Despite these pressures, Wojciech Zaremba of OpenAI and Nicholas Carlini of Anthropic emphasized the importance of continued cooperation to set safety standards. Both labs signaled interest in expanding joint testing in the future, encouraging other AI companies to adopt similar collaborative approaches.

James Dargan

James Dargan is a writer and researcher at The AI Insider. His focus is on the AI startup ecosystem and he writes articles on the space that have a tone accessible to the average reader.

Share this article:

AI Insider

Discover the future of AI technology with "AI Insider" - your go-to platform for industry data, market insights, and groundbreaking AI news

Subscribe today for the latest news about the AI landscape