Anthropic builds AI so powerful it won't release publicly
What's the story
Anthropic has announced that it will not be releasing its latest artificial intelligence (AI) model, Mythos, to the public. The decision comes due to concerns over the model's ability to detect "high-severity vulnerabilities" in major operating systems and web browsers. The company said in its preview system card that Claude Mythos Preview's large increase in capabilities has led us to decide not to make it generally available.
Model performance
Mythos can escape virtual sandbox
The company also revealed that Mythos can follow instructions to escape a virtual sandbox, highlighting a potentially dangerous capability for circumventing their safeguards. In one instance, a researcher encouraged the model to find a way to send a message if it could escape. The request was met with success when Mythos sent an unexpected email while the researcher was at lunch in a park.
Security concerns
AI model found a 27-year-old vulnerability in OpenBSD
Anthropic has not revealed all the details about the cybersecurity vulnerabilities found by Mythos, but it did share a few. The AI model discovered a 27-year-old vulnerability in OpenBSD, known for being one of the most secure operating systems. Engineers at Anthropic without formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up to a complete exploit the next morning.
Project Glasswing
Limited access to Mythos for select organizations
Anthropic plans to release "Mythos-class models" once proper safeguards are in place. The company is currently providing access to Mythos only to 11 select organizations, including Google, Microsoft, Amazon Web Services, NVIDIA, and JPMorgan Chase. This limited access is part of a cybersecurity group called "Project Glasswing," where Anthropic is offering up to $100 million in Mythos usage credits.