The new lawsuit accusing Salesforce of using stolen works to train its xGen large language models is another example of legal action aimed at holding tech vendors responsible for how they train their AI models.
Novelists Molly Tanzer and Jennifer Gilmore filed a class action complaint on Oct. 15 in U.S. District Court in San Francisco, accusing Salesforce of copyright infringement when the CRM and CX giant allegedly used thousands of books to train its xGen series of LLMs.
The authors say in the suit that Salesforce “unlawfully downloaded, stored, copied and used the datasets to develop” the models.
Fair Use and Pirated Works
Salesforce is not the first vendor to be accused of taking content from pirated copyrighted books. Last month, generative AI vendor Anthropic agreed to a $1.5 billion settlement after a judge ruled that the AI model maker had used millions of books included in several large pirated datasets to train its AI models.
“[Salesforce’s lawsuit] seems to be a very similar situation to the Anthropic situation,” said Michael Bennett, associate vice chancellor for data science and AI strategy at the University of Illinois Chicago.
He added that in the Anthropic case, the judge ruled that works acquired legally and used in training the models constituted fair use, while works not acquired legally do fall under fair use protection. Fair use is the doctrine in copyright law that stipulates that limited use of copyrighted material is permissible to allow free expression.
“The method of acquisition of copyrighted protected works that are used to train a model, that’s really where the question sits right now,” Bennett said.
However, it is likely that the Salesforce case will settle rather than go to trial, similar to the settlement of the Anthropic lawsuit, said Kashyap Kompella, founder and analyst at RPA2AI.
“The Anthropic settlement suggests that negotiated resolution may be the pragmatic route forward for AI companies,” Kompella added. “It signals that copyright owners have leverage and that training data provenance is both a commercial and legal issue.”
Doubts in the Enterprise
The lawsuit against Salesforce is not only about fair use. It could also harm Salesforce because it could make customers of the vendor question whether they can trust its models and the data sets it used to train the models, Kompella continued.
“Enterprise clients need assurance that their AI vendor data sources are licensed, auditable, and defensible,” Kompella said. “Enterprise clients should satisfy themselves about the provenance and traceability of training data and understand the indemnity clauses that the AI vendors provide.”
Some vendors offer indemnity clauses to customers, pledging to compensate them if they are found to have used copyrighted content illegally.
Lawsuits like these could be another barrier to wider AI adoption.

