New AI benchmark tests LLMs on real-world cybersecurity tasks
CrowdStrike and Meta just dropped CyberSOCEval, a new open-source tool that tests how well large language models (LLMs) can handle real-world cybersecurity tasks.
Think of it as a skills check for AIs working in security operations centers—covering things like spotting malware, responding to incidents, and understanding hacker tactics.
Open access
Instead of being locked behind paywalls or big company secrets, CyberSOCEval is open to everyone.
It uses realistic attack scenarios designed by experts so both security pros and AI developers can see exactly how different models perform—and where they need work.
Setting a new standard
With Meta's open-access approach and CrowdStrike's frontline experience, this benchmark could set the standard for using AI in enterprise security.
It gives teams a clearer way to pick the right tools and helps push the whole industry forward—making sure future AIs are actually ready for the threats out there.