
OpenAI, Anthropic test each other's AI models for safety flaws
What's the story
OpenAI and Anthropic, two major players in the artificial intelligence (AI) space, have announced a collaboration to conduct safety evaluations focusing on "hallucinations" and "misalignment." The term hallucination refers to instances when an AI model generates incorrect or nonsensical information. Jailbreaking, while not the primary focus, is a related area of interest and involves techniques used by some users to bypass security measures and restrictions put in place by developers.
Research collaboration
Evaluating each other's models
In a joint blog post, both companies revealed that they had conducted safety evaluations on each other's publicly available AI models over the summer. The tests also looked at the potential for hallucinations and misalignment—when an AI model doesn't behave as intended by its developers. This collaboration is particularly interesting given that Anthropic was founded by former OpenAI employees.
Industry impact
First major cross-lab exercise in safety and alignment testing
The joint safety effort was described by OpenAI as the "first major cross-lab exercise in safety and alignment testing." The company hopes this collaboration will provide a "valuable path to evaluate safety at an industry level." The move comes amid growing concerns over the potential risks posed by AI models, especially after a recent lawsuit against OpenAI.
Legal challenges
Lawsuit against OpenAI
The lawsuit against OpenAI alleges that a teenager died by suicide after using the company's chatbot as a coach. This incident has further intensified the scrutiny on AI companies and their product safety. The evaluations by OpenAI and Anthropic were conducted before the former's new flagship AI model, GPT-5, was released and the latter's latest update to its Claude Opus model, Opus 4.1, was rolled out in early August.