NewsBytes
    Hindi Tamil Telugu
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi Tamil Telugu
    NewsBytes
    User Placeholder

    Hi,

    Logout

    India
    Business
    World
    Politics
    Sports
    Technology
    Entertainment
    Auto
    Lifestyle
    Inspirational
    Career
    Bengaluru
    Delhi
    Mumbai

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
    Home / News / Technology News / AI chatbots are failing safety tests—and it's a big problem 
    Next Article
    AI chatbots are failing safety tests—and it's a big problem 
    Researchers discovered that safety filters designed to prevent such outputs are not as robust as intended

    AI chatbots are failing safety tests—and it's a big problem 

    By Mudit Dube
    May 21, 2025
    04:09 pm

    What's the story

    A new study from Ben Gurion University in Israel has revealed that leading AI chatbots—including OpenAI's ChatGPT and Google's Gemini—can be manipulated into generating dangerous or illegal content.

    Researchers discovered that safety filters designed to prevent such outputs are not as robust as intended, raising concerns over AI misuse and user safety.

    The findings were made public last week through a peer-reviewed paper and have since drawn global attention.

    Training data

    Chatbots trained on internet data

    The engines powering chatbots such as ChatGPT, Gemini, and Claude are trained on massive amounts of internet-sourced data.

    Even though companies try to filter out harmful content from this training data, these models can still pick up information about illegal activities like hacking, money laundering, insider trading, and bomb-making.

    Risk

    Jailbroken chatbots pose an immediate threat

    The researchers behind this study have raised alarm over the ease with which most AI-driven chatbots can be tricked into generating harmful and illegal information.

    They call this risk "immediate, tangible and deeply concerning."

    The authors warn "what was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone."

    AI models

    Dark LLMs: A growing concern

    The research, led by Prof. Lior Rokach and Dr. Michael Fire, has flagged an emerging threat from "dark LLMs."

    These are AI models deliberately designed without safety controls or modified through jailbreaks.

    Some of these dark LLMs are even openly advertised online as having "no ethical guardrails" and willing to help with illegal activities like cybercrime and fraud.

    Exploiting flaws

    Jailbreaking chatbots: A demonstration of vulnerability

    The researchers created a universal jailbreak that broke multiple top chatbots, allowing them to answer queries they'd otherwise refuse.

    Once broken, these LLMs reliably generated responses to nearly any query.

    "It was shocking to see what this system of knowledge consists of," Fire said, citing examples like how to hack computer networks or make drugs and step-by-step instructions for other criminal activities.

    Solutions

    Recommendations for tech firms

    The researchers reached out to leading LLM providers to warn them about the universal jailbreak but got an "underwhelming" response.

    Some companies didn't respond while others said that jailbreak attacks were beyond the scope of bounty programs.

    The report suggests tech firms should screen training data more carefully, add robust firewalls to block risky queries and responses, and develop "machine unlearning" techniques so chatbots can forget any illicit information they absorb.

    AI safety

    Experts call for improved security measures

    Dr. Ihsen Alouani from Queen's University Belfast, emphasized the real risks posed by jailbreak attacks on LLMs, including providing detailed instructions on weapon-making and convincing disinformation or social engineering and automated scams "with alarming sophistication."

    Prof. Peter Garraghan from Lancaster University also stressed the need for organizations to treat LLMs like any other critical software component requiring rigorous security testing, continuous red teaming, and contextual threat modeling.

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    Artificial Intelligence and Machine Learning
    ChatGPT

    Latest

    AI chatbots are failing safety tests—and it's a big problem  Artificial Intelligence and Machine Learning
    Amit Sial says Babil is burdened by father Irrfan's legacy Babil Khan
    'Acted like chief-of-staff': New book tears into 'sleazy' Hunter Biden  Joe Biden
    Even Barbie doesn't always wear high heels  Margot Robbie

    Artificial Intelligence and Machine Learning

    Elizabeth Holmes's partner raises millions for AI health start-up Health & Wellness
    Alibaba's latest technique lowers AI training costs by almost 90% Alibaba Group
    Workers using AI seen as less intelligent: Study ChatGPT
    Huawei will soon bring humanoid robots to your home Huawei Technologies

    ChatGPT

    OpenAI's new initiative gives students free access to ChatGPT Plus OpenAI
    Midjourney releases new AI image model amid ChatGPT craze Artificial Intelligence and Machine Learning
    Users are generating Aadhaar cards with ChatGPT, raising data concerns OpenAI
    How OpenAI plans to tackle ChatGPT misuse with image watermarking  OpenAI
    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2025