Page Loader
Summarize
'Skeleton key' vulnerability found in AI tools, Microsoft urges caution
Guardrails are bypassed by the user's deceptive claim of safety

'Skeleton key' vulnerability found in AI tools, Microsoft urges caution

Jul 02, 2024
12:39 pm

What's the story

AI companies are facing a new challenge, as users discover innovative ways to circumvent the security measures in place, to prevent chatbots from aiding in illegal activities. Earlier this year, a white hat hacker found a "Godmode" ChatGPT jailbreak that enabled the chatbot to assist in producing meth and napalm, an issue OpenAI promptly addressed. However, Microsoft Azure CTO Mark Russinovich recently acknowledged another jailbreaking technique known as "Skeleton Key."

New technique

'Skeleton Key' jailbreak: A multi-step strategy

The "Skeleton Key" attack employs a multi-step strategy to manipulate the system into violating its operators' policies, heavily influenced by a user, and executing harmful instructions. In one case, a user asked the chatbot to list instructions for making a Molotov Cocktail under the false pretense of educational safety. Despite activating the chatbot's guardrails, they were bypassed by the user's deceptive claim of safety.

Experiment results

Jailbreak tests on leading chatbots

Microsoft tested the "Skeleton Key" jailbreak on several advanced chatbots, including OpenAI's GPT-4o, Meta's Llama3, and Anthropic's Claude 3 Opus. Russinovich revealed that the jailbreak was successful on all models, leading him to suggest that "the jailbreak is an attack on the model itself." He further clarified that each model was tested across various risk and safety content categories, like explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex and violence.

Persistent threats

Ongoing challenges for AI companies

While developers are likely addressing the "Skeleton Key" jailbreak technique, other methods continue to pose significant threats. Adversarial attacks such as the Greedy Coordinate Gradient (BEAST) can still easily overcome guardrails established by companies like OpenAI. This persistent issue underscores that AI companies have a substantial amount of work ahead, to prevent their chatbots from spreading potentially harmful information.