Adobe sued for using copyrighted books to train AI models
What's the story
Adobe, a leading software company, is facing a proposed class-action lawsuit for allegedly using pirated books to train its artificial intelligence (AI) models. The lawsuit was filed by Elizabeth Lyon, an author from Oregon. She claims that Adobe used her work and many other authors' works without permission in the training of its SlimLM program.
Model details
SlimLM: Adobe's AI model under scrutiny
Adobe's SlimLM is a series of small language models optimized for document assistance tasks on mobile devices. The company claims that SlimLM was pre-trained on SlimPajama-627B, an open-source dataset released by Cerebras in June 2023. However, Lyon alleges that her works were included in this pretraining dataset without her consent.
Dataset controversy
Allegations of dataset manipulation in AI training
Lyon's lawsuit alleges that her writing was included in a processed subset of a manipulated dataset that Adobe used for its program. The suit claims, "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3)." It further states, "Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members."
Legal issues
Books3 and RedPajama: Controversial datasets in AI training
Books3, a massive collection of 191,000 books used to train generative AI systems, has been at the center of legal disputes in the tech community. RedPajama has also been implicated in several lawsuits. In September, Apple was sued for using copyrighted material to train its Apple Intelligence model with this dataset. A similar lawsuit against Salesforce also claimed the company had used RedPajama for training purposes.
Legal landscape
AI training lawsuits: A growing trend in the tech industry
The use of pirated materials in AI training datasets has led to a surge in lawsuits against tech companies. In September, Anthropic paid $1.5 billion to authors who accused it of using their work without permission to train its chatbot, Claude. The case was seen as a potential landmark moment in the ongoing copyright disputes over AI training data.