LOADING...
Summarize
This Google-funded start-up allegedly stole YouTube videos for AI training
Runway is also accused of using pirated content to train its AI

This Google-funded start-up allegedly stole YouTube videos for AI training

Jul 28, 2024
12:00 pm

What's the story

Runway, an AI start-up backed by Google, is facing accusations of using pirated content and unauthorized YouTube videos to train its Gen-3 Alpha video generation tool. The allegations emerge from a leaked internal document obtained by 404 Media, reportedly shared by an ex-employee of Runway. The document outlines plans to categorize and tag content from over 3,900 YouTube channels including major media giants like Disney and Netflix, as well as popular creators like Casey Neistat and Marques Brownlee (MKBHD).

Training methods

Gen-3 Alpha draws attention amid controversy

The Gen-3 Alpha video generation tool, developed by Runway, gained significant attention last month for its ability to generate nearly photorealistic clips. The company stated that the tool was "trained jointly on videos and images," but did not disclose the data source. Despite not confirming the authenticity of the leaked spreadsheet, Runway previously claimed to use "curated, internal datasets" for training. However, 404 Media managed to create convincing videos of well-known YouTube personalities using this tool.

Data collection

Alleged use of proxies and massive web crawler

Runway reportedly went to the extent of covering its tracks by using a proxy to avoid being blocked by YouTube. "The channels in that spreadsheet were a company-wide effort to find good quality videos to build the model with," an unnamed former employee told 404 Media. "This was then used as input to a massive web crawler which downloaded all the videos from all those channels, using proxies to avoid getting blocked by Google," the employee added.

Copyright issues

Intellectual property concerns in AI training

This isn't the first instance of an AI company facing scrutiny for using copyrighted material without necessary licenses. Earlier this year, OpenAI CTO Mira Murati admitted in a Wall Street Journal interview that she was unsure if training data for the company's upcoming Sora video generator, included videos from Instagram, YouTube, or Facebook. The New York Times later reported that OpenAI had bypassed corporate policies to evade copyright laws, using tools to transcribe YouTube videos for training its AI chatbots.

Legal battle

YouTube CEO warns against violation of platform's terms

YouTube CEO Neal Mohan has cautioned AI companies that using YouTube videos to train AI models would constitute a "clear violation" of the platform's terms of use. The issue of intellectual property infringement remains a significant hurdle in the development of generative AI, particularly with models capable of generating entire videos. Runway, valued at $1.5 billion, raised $141 million in funding last year from investors including YouTube owner Google, NVIDIA, and Salesforce.