OpenAI launches new safety program to tackle AI misuse
What's the story
OpenAI has launched a new 'safety bug bounty program' aimed at detecting and mitigating potential risks of AI misuse. The company is inviting researchers to report threats that go beyond conventional security vulnerabilities, in an effort to bolster protections across its products. The initiative covers a wide range of AI-specific risk areas, including agent-based systems like prompt injection attacks and data exfiltration.
Risk scenarios
Addressing large-scale harmful actions and material risk behaviors
The safety bug bounty program also addresses scenarios where AI systems could carry out harmful actions on a large scale or behave in ways that may pose material risk. OpenAI is also looking for reports on vulnerabilities that could expose proprietary information, including internal system details or model-related data. The company is particularly interested in maintaining account and platform integrity by preventing attempts to bypass safeguards, evade restrictions, or manipulate trust signals within its systems.
Submission details
Submissions routed between safety and existing security programs
Participants in the safety bug bounty program can submit their findings through a dedicated platform. The reports will then be reviewed and triaged by OpenAI's safety and security teams. Depending on the nature of the issue, these submissions could be routed between the safety and existing security bug bounty programs. This way, OpenAI hopes to work closely with researchers to address risks beyond conventional vulnerabilities.
Industry response
Major shift in the industry toward comprehensive risk assessment
The launch of this program marks a major shift in the industry, as it seeks to address not just system security but also societal and misuse risks associated with advanced AI systems. OpenAI has said that separate, targeted bounty initiatives may continue for specific high-risk areas. However, general content policy bypasses such as basic jailbreak attempts without clear safety impact are not eligible for rewards under this new initiative.