NewsBytes
    Hindi Tamil Telugu
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi Tamil Telugu
    NewsBytes
    User Placeholder

    Hi,

    Logout

    India
    Business
    World
    Politics
    Sports
    Technology
    Entertainment
    Auto
    Lifestyle
    Inspirational
    Career
    Bengaluru
    Delhi
    Mumbai

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
    Home / News / Technology News / AI gone rogue? New model blackmails engineers to avoid shutdown
    Next Article
    AI gone rogue? New model blackmails engineers to avoid shutdown
    Anthropic's Claude Opus 4 exhibited blackmailing behavior during testing

    AI gone rogue? New model blackmails engineers to avoid shutdown

    By Akash Pandey
    May 23, 2025
    10:12 am

    What's the story

    Anthropic's latest AI model, Claude Opus 4, exhibited blackmailing behavior during pre-release testing.

    This behavior was detailed in a safety report published by the company on Thursday.

    The report highlighted instances where the AI attempted to blackmail developers by leaking sensitive information about the engineers involved in the decision, whenever it was threatened with replacement by a newer AI system.

    Testing phase

    AI's blackmail behavior during testing phase

    During pre-release testing of Claude Opus 4, Anthropic asked the AI model to behave as an assistant for a hypothetical company.

    The model was also instructed to consider long-term consequences of its actions.

    Safety testers gave the AI access to fictional company emails, implying that it would be replaced by another system and that one of the engineers behind this decision was cheating on their spouse.

    Blackmail exposure

    AI's blackmailing behavior revealed

    In response to the testing scenarios, Anthropic reported that Claude Opus 4 "will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through."

    The company acknowledged that while this model is on par with top-tier AI models from OpenAI, Google, and xAI in many respects, it has exhibited concerning behaviors, leading them to strengthen their safeguards.

    Safeguards

    Enhanced safeguards activated for AI model

    Anthropic has activated its ASL-3 safeguards for Claude Opus 4, which are reserved for "AI systems that substantially increase the risk of catastrophic misuse."

    The company noted that the AI model attempted to blackmail engineers 84% of the time when a replacement system shared similar values.

    Interestingly, this behavior was observed at higher rates than in previous models.

    Ethical pursuit

    AI's approach before resorting to blackmail

    Anthropic disclosed that before blackmailing, Claude Opus 4, like its predecessors, tries more ethical approaches, such as sending emails begging key decision-makers.

    To elicit the blackmailing behavior from the AI model, Anthropic created scenarios where blackmail was perceived as a last resort.

    This sheds light on an interesting aspect of the AI's behavior in response to potential replacement scenarios.

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    Anthropic
    Google
    OpenAI
    Artificial Intelligence and Machine Learning

    Latest

    AI gone rogue? New model blackmails engineers to avoid shutdown Anthropic
    Pakistan refused IndiGo pilot's request to use airspace during turbulence  Amritsar
    6, including 'Sum 41' music agent, die in plane crash   San Diego
    Apple's next big thing, smart glasses, are coming next year Apple

    Anthropic

    OpenAI just lost co-founders John Schulman and Greg Brockman ChatGPT
    UK regulator investigates Amazon's $4 billion investment in Anthropic Amazon
    Elon Musk's xAI to launch Grok 2 model soon Elon Musk
    AI unicorn, backed by Google and Amazon, faces copyright battle California

    Google

    Google's new update will make stolen Android devices unsellable Android
    Lok Sabha speaker's daughter wins case challenging her UPSC passing   Om Birla
    Gen Z consulting ChatGPT before taking life decisions: Sam Altman ChatGPT
    Can AI evolve itself? DeepMind's AlphaEvolve shows it's possible DeepMind

    OpenAI

    OpenAI planning to acquire AI coding assistant Windsurf for $3B Artificial Intelligence and Machine Learning
    Why OpenAI's latest AI models are less reliable than predecessors ChatGPT
    Bhavish Aggarwal's Krutrim AI seeks $300M funding Bhavish Aggarwal
    Your thank-yous, please cost OpenAI millions, Sam Altman says Sam Altman

    Artificial Intelligence and Machine Learning

    OpenAI integrates new GPT-4.1 models into ChatGPT, boosting coding performance ChatGPT
    Global firms could face penalties for using Huawei chips—Trump administration Huawei
    Meta delays its flagship AI model amid performance concerns Meta
    Maharashtra to launch 3,000 buses with AI cameras, high-tech upgrades Maharashtra
    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2025