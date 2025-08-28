OpenAI and Anthropic , two major players in the artificial intelligence (AI) space, have announced a collaboration to conduct safety evaluations focusing on "hallucinations" and "misalignment." The term hallucination refers to instances when an AI model generates incorrect or nonsensical information. Jailbreaking, while not the primary focus, is a related area of interest and involves techniques used by some users to bypass security measures and restrictions put in place by developers.

Research collaboration Evaluating each other's models In a joint blog post, both companies revealed that they had conducted safety evaluations on each other's publicly available AI models over the summer. The tests also looked at the potential for hallucinations and misalignment—when an AI model doesn't behave as intended by its developers. This collaboration is particularly interesting given that Anthropic was founded by former OpenAI employees.

Industry impact First major cross-lab exercise in safety and alignment testing The joint safety effort was described by OpenAI as the "first major cross-lab exercise in safety and alignment testing." The company hopes this collaboration will provide a "valuable path to evaluate safety at an industry level." The move comes amid growing concerns over the potential risks posed by AI models, especially after a recent lawsuit against OpenAI.