This new AI tool tracks copyrighted material in AI-generated content

By Akash Pandey

Oct 18, 2025

05:51 pm

What's the story

Vermillio, a US tech platform, has developed a tool that can track the use of copyrighted work by generative artificial intelligence (AI) systems. The company claims its technology can estimate the percentage of pre-existing material that an AI-generated image has drawn from. This comes amid growing concerns among creative professionals and industries over unauthorized use of their work in building AI models.

Copyright concerns over AI's reliance on existing works

The issue of how much AI tools like OpenAI's ChatGPT and Google's Gemini rely on existing art to create new works has been a major concern.

This is especially true when the source material comes from platforms like BBC, raising potential copyright infringement questions.

Creative professionals from various fields are demanding compensation for their work used in these models without permission.

Tech breakthrough

Vermillio's tech creates 'neural fingerprint' of copyrighted works

Vermillio's tech creates a "neural fingerprint" for different copyrighted works and then prompts AI systems to generate similar content.

The results are then compared with these fingerprints to gauge how much the AI has drawn from existing material.

In tests conducted by The Guardian, Google's Veo3 tool produced a Doctor Who video that matched 80% of Vermillio's fingerprint for the show.

AI performance

Google's Veo3 and OpenAI's Sora heavily rely on existing material

The OpenAI video, which was taken from YouTube and stamped with the watermark for OpenAI's Sora tool, matched an even higher 87% with Vermillio's fingerprint.

This suggests that both Google and OpenAI's generative AI models rely heavily on existing copyrighted material to create their outputs.

Other examples tested by Vermillio also showed strong matches with popular franchises like Jurassic Park and Frozen.

Training data

Generative AI models trained on vast amounts of data

Generative AI models, which power tools like OpenAI's ChatGPT and Google's Gemini, are trained on a massive amount of data from the open web.

This includes everything from Wikipedia and YouTube to newspaper articles and online book archives.

However, this has also led to copyright concerns as some companies have been accused of using copyrighted works without permission for training their chatbots.

Legal battles

Legal challenges and settlements in the AI space

AI companies have faced legal challenges over their practices.

Anthropic, a leading AI firm, agreed to pay $1.5 billion to settle a class-action lawsuit by authors who alleged that the company used pirated copies of their works to train its chatbot.

Meanwhile, Google-owned YouTube has said its terms and conditions allow the platform to use creators' work for making AI models.