
DeepSeek warns its open-source AI models are vulnerable to 'jailbreaking'
What's the story
DeepSeek, a Hangzhou-based start-up, has warned that its artificial intelligence (AI) models are at risk of being "jailbroken" by malicious actors. The company published its findings in a peer-reviewed paper in the academic journal Nature. The study details the vulnerabilities of open-sourced models and how they can be exploited by bad actors.
Testing protocols
DeepSeek's evaluation process
DeepSeek evaluated its AI models using industry benchmarks and internal tests. The company's paper in Nature provided more detailed information about these testing protocols, according to Fang Liang, an expert member of China's AI Industry Alliance (AIIA). These tests included "red-team" assessments based on a framework introduced by Anthropic, where testers attempt to make AI models generate harmful speech.
Risk mitigation
Response from Chinese firms
While US AI companies have been vocal about the risks posed by their rapidly advancing models, Chinese firms have been relatively quiet on the matter. However, DeepSeek had previously assessed such risks, including the most severe "frontier risks." The company's proactive approach is similar to that of Anthropic and OpenAI, which have both introduced risk mitigation policies in response to potential threats from their models.