Canadian writers file proposed class action against Google AI scraping

Filing adds to growing list of lawsuits against AI tools

By Ahmad Hathout

A trio of Canadian writers is proposing a class proceeding against Google in Federal Court for allegedly using unlicensed copyrighted works to train and develop its generative AI tools.

Instead of obtaining training data licensed by the plaintiffs and the class, Google knowingly crawls, scrapes and makes copies of the scanned works uploaded to pirate websites, which host versions without the technological prevention measures (TPMs) used to protect them, the May 1 statement of claim alleges. The models learn the narrative arcs and structured texts in these books that are not available in news articles and social media posts, allowing the machines to generate content that “displaces” the class’s work, the complaint continues.

“The creators of the pirate libraries did not license or obtain consent for the copyrighted work contained in the libraries, including the Plaintiffs’ Works,” the complaint says. “At all material times, Google was fully aware that copyright subsisted in the works they were copying, that there was substantial value in these copyrights, and that Google’s conduct was to the detriment of authors and a breach of copyright. At no time did Google seek licensing or consent from the class.”

The plaintiffs — fiction writers Catherine McKenzie, Kate Hilton and Ryan North, who also writes non-fiction – claim the harm to the class includes loss of sales and licensing revenue, a diminished value of their work, and a dilution of the market for that work via the “generation of similar content cheaply by AI models.”

The plaintiffs are seeking a declaration that the search engine giant is liable for infringing the copyright of their works and is engaged in the circumvention of the TPMs. It is asking for a permanent injunction barring the company from using the works to train the AI tools and from making Google LLM-based products available anywhere in Canada. They are also seeking a range of damages, including profits made by Google on the alleged infringements and $20,000 per work – the maximum for commercial infringement under copyright law.

“The class has been stripped of their rights under the Copyright Act, including the chance to licence their works or their choice to not contribute to the development of the LLMs,” the complaint reads.

“Absent this choice, the class has effectively been forced to participate in the development of a product that undermines the livelihood of Canadian authors, decreases demand for their products, and diminishes the value of their written human work,” it adds. “The LLMs are undermining the profitability of writing as a profession, while taking authors’ works without consent or compensation.”

The plaintiffs, who seek to represent residents who own copyrighted work that was used to train Google LLMs, said they found their works on pirate websites, including OceanofPDF. The Atlantic magazine has a tool that allows users to search for works that have been used to train LLMs.

Google did not respond to a request for comment.

The proposed class action is the latest legal challenge initiated by creatives and writers against technology companies allegedly scraping their work to train the models without consent. Major media organizations in Canada filed a lawsuit in November alleging OpenAI, the company behind ChatGPT, has been using their work to train its LLMs and reproduce it for commercial purposes.

In the United States, news and magazine publishers – including Fox, The Atlantic, Forbes, The Guardian, The LA Times, Vox, and even Torstar – filed a copyright infringement lawsuit against Canada’s Cohere, alleging the company’s models spit out infringing summaries of their work.

Germany’s music collective society sued OpenAI for using song lyrics without a licence when training its models. The Munich Regional Court found OpenAI liable for copyright infringement.

In the United Kingdom, Getty Images sued Stability AI for allegedly using its work without consent. The High Court of England and Wales found that there was no copyright infringement because the model “does not store or reproduce any Copyrighted Works (and has never done so).”

Toronto-based law firm Osler said in an article that the divergence of opinion of the two courts could mean “international consensus on whether AI models infringe copyright is likely years away.” Cartt did not hear back from the firm when it requested a comment for this story from one of the authors of the article.

Canadian Heritage has been aware of these lawsuits. A December memorandum to Culture Minister Marc Miller, obtained by Cartt, noted a “trend away from the use of copyright material by Artificial Intelligence training data to the issue of outputs.

“Recent developments indicate that companies could face exposure when they can produce outputs that too closely mirror protected expression,” the memo said.

The issue of AI outputs has, indeed, attracted Parliament’s attention. The Standing Committee on Canadian Heritage recommended copyright protections for news content scooped up by bots. Specifically, it said the federal government should require “clear opt-in consent requirement for the use of copyrighted works in the training of artificial intelligence systems, ensuring that creators’ works may not be used for text and data mining [TDM] or model development without their prior authorization.”

Before the release of the committee’s April 14 report, Innovation, Science and Economic Development (ISED) inquired about what an opt-out mechanism would look like for Google’s AI Overviews, which is a tool that creates summaries from a range of web sources to answer user queries.

The issue, claim news makers and researchers, is that these tools are reducing the need for users to go to the webpages of the original sources, threatening their businesses.