My Book Was Stolen by an AI Company. Why Does Suing Them Feel Wrong?

In 2023, a new meme began spreading among authors on Instagram. A screenshot of a search box holds an author’s name; below it, there’s a list of book titles. The author is relaying that each displayed title has been used without permission by a tech company to teach a large language model—or LLM—to speak and think. The captions contain expletives, or paragraphs about corporate greed, or AI’s exorbitant water use.

The first wave of these posts came in September of that year. The Atlantic scrutinized an AI training data set called Books3, composed of “large, unlabeled blocks of text” and extracted ISBNs. This determined that Books3 was composed of 191,000 books, and the magazine identified author information for 183,000 of them. The Atlantic released a searchable index of these titles, with the news that Books3 had been used by Meta and Bloomberg to train their AI.

The second wave happened this March, when The Atlantic released a new searchable inventory, this one consisting of the pirated books site Library Genesis, a.k.a. LibGen. Court documents revealed Meta and Anthropic were downloading books from LibGen to train Llama and Claude, their respective chatbots powered by LLMs.

One result of “the infrastructure of collective life” (to quote academic and writer Rinaldo Walcott) being trimmed to a few demonic platforms is that you must use your enemy’s tools to sound the alarm about their incursions. In the platform’s grammar, declarations your work has been stolen by an LLM can’t help but sound like boasting. If any author felt secretly pleased to be selected for Books3, it’s evidence of how valueless we feel. Commenters on social networks must follow the rules as well. If they want to earn the likes that will elevate their comments above the fold, using sentimentality is a safe bet. Variations of “I’m so sorry this happened to you” appear again and again. One user said the LLMs amplified the trauma an author described in her memoir of abuse. A system-wide emergency is reduced to a harm done to a handful of authors.

In 2023, I searched the Books3 database for my work, and my 2018 novel was there. Next, I searched for the names of friends—I wanted to ask somebody what to do. The rage flaming my timelines made clear, logical sense; the description of finding unmarked “blocks of text” in Books3 conjured up a stripped carcass. But I myself had no clue what it meant, or how to feel. Carrianne Leung, who has published two books of fiction, signed a 2023 Authors Guild letter to the chief executive officers of OpenAI, Meta, and others. The letter demanded compensation, and that companies obtain permission for use of authors’ works in generative AI programs. Recently, I asked if she would accept a payout from one of these tech companies. “I wouldn’t even know what I was agreeing to,” she said. “I don’t know what’s at stake.” She signed the letter because “as writers we need to stick together,” but she sounded wary.

The hypothetical is becoming real. Over a dozen copyright violation suits have been filed by authors against major AI firms, including Meta, OpenAI, Google, and Anthropic. This June, two United States copyright cases were decided in landmark rulings, on the side of the tech companies. The books used to train Meta’s Llama and Anthropic’s Claude have been deemed “fair use.” But in the Anthropic case—a class action suit affecting all titles downloaded by the company, over 7 million—the judge ordered retroactive payment to the rightsholders. The court has approved Anthropic’s offer of a $1.5 billion (US) settlement, or $3,000 per title, though there are conditions. A number of criteria mean only about 500,000 of the 7 million titles qualify. Payments will be made after legal fees, admin fees, and expenses are deducted, and will be split between authors and publishers—as the Authors Guild sadly notes, “this fact has been lost in some of the coverage of the suit.”

This settlement could be the first of several. But the AI panic of the past few years has given no time to parse the assumptions and definitions of art and literature these cases rest on—and if we even accept them. As Leung asks, what are we agreeing to? There’s the contradiction of accepting funds issuing from the same technology we say is destroying us.

There’s also the fact that copyright is the device we’ve evolved to secure compensation. But its protection mutates my work into a commodity, a revenue-generating instrument. Like many economic concepts, this is an abstraction, one that rules us while being entirely disconnected from how art operates practically. I’d hazard a guess most of us think of art as a thing held in common—a shared inheritance. Archetypes belong to everyone: that’s why art galleries and libraries and arts councils receive public funding; that’s why Top 40 radio plays a Friday-morning megamix. As is typical in my line of work, I don’t consider the stories I’ve written my property; a story isn’t finished until the reader completes it.

Philosopher Hannah Arendt says, “Each time you write something and you send it out into the world and it becomes public, obviously everybody is free to do with it what he pleases, and this is as it should be.” I’m mixing philosophical considerations with serious ones of livelihood, and by “he,” Arendt did not mean Mark Zuckerberg. But pinning our hopes on copyright aligns us with the extractive forces LLM protests mean to counter: corporations invested in exclusion, access restriction, and rent seeking.

As I wrote this article, I had to keep rewriting it, developments in the copyright cases coming by the week. But now I’d argue it doesn’t matter what happens next, because tackling AI’s exploitation of only authors produced a win of narrow significance. Individual authors get a small sum, and corporations carry on. New York Magazine’s Intelligencer announced that the era of free data scraping is over, but this also means that now, data scraping to feed LLMs is normal.

A nice way to conceive of a class action suit is that it’s us taking care of each other. Can we expand who “each other” is? Authors are not special victims. We belong to a collective of twenty-first-century workers, all of us the same because we are all appraised as a cost to be cut by the capabilities of AI. When we understand this, a much larger solution than litigation presents itself. But we have to unthink of art as someone’s property to get there.

The copyright cases have created a template for how to think about AI and creation: built on theft of property. Authors on TikTok have been making videos of themselves writing to prove they work without AI. A time-lapse video of 100 micro grimaces, while hands pound at a keyboard, can’t help but look goofy. One TikToker says this proves she isn’t “a thief”—that is, a writer who uses AI. Wired reports the trend has smoothly transitioned into bullying, with some authors accused of being too prolific: they must be using AI. It’s like any other socials-based purity campaign—a desperate grasp for cultural capital in a marketplace built on quicksand.

This August, the Eden Mills Writers’ Festival announced a panel hosted by an LLM: Aiden Cinnamon Tea, run by ChatGPT and trained on the works of theorist Vanessa Machado de Oliviera. Its purpose is to facilitate discussions about modernity’s extractive use of technology, irony aside. It’s part of the project Burnout from Humans, which “challenges the arrogance of superiority and invites us to steward AI—and ourselves—with wisdom and responsibility.” The festival received hundreds of complaints, calling the programming “clickbait,” a joke “in very, very poor taste,” satire, and a celebration of “wholesale theft from human authors.” The panel was cancelled.

In other words, corpuses like Books3, or LibGen, have become slurs, taboo. Yet the community discourse hasn’t touched what these corpuses are, or why they exist.

Books3 is the work of one man. Not a tech don, but an unemployed guy in Missouri. Shawn Presser collected the corpus from Bibliotik, a pirate site, and created Books3, an open-source training data set, so anyone, not only “billion-dollar corporations,” could make their own LLM. According to Wired, his goal was to “democratize access,” a mission redolent of the early internet’s sweetest promises. Hack the planet and make information free.

Because Books3 is a training data set, Instagramming authors assumed LibGen was too: if their book was on LibGen, it had been used to train an AI. But LibGen is not a training data set. It is publishing’s Napster, a two-decades-old repository of millions of pirated books. In 2022, a Redditor calculated it took in 1,300 new additions a day. Court proceedings, drawing from Meta’s internal correspondence, estimate that Meta downloaded millions of titles from LibGen. It is estimated Anthropic downloaded 5 million books from LibGen. The titles were not released. That’s a lot of books, and potentially a fraction of the total books on the website. TL;DR: a title existing on LibGen doesn’t mean LLMs have used it.

This distinction wasn’t clear to the authors who found their work in The Atlantic’s LibGen search and declared anyone using generative AI was stealing from them personally. Why does the distinction even matter? Other than the bleak comedy of a multi-billion-dollar company downloading books like a dad on a tablet, it speaks to the likelihood that, until this past March, many Canadian and US authors had never heard of LibGen. We live on a different planet from the thousands who rely on it to read our work.

Students whose libraries are outside the perimeter of North American and European universities and engines of culture have no line to the necessary scholarship. Copyright enforces segregation. Advancing their own research is pulling teeth. SciHub, which is connected to LibGen and stores academic journals, was created by a neuroscience student in Kazakhstan. LibGen’s purpose is “universal access to knowledge,” to quote Shadow Libraries, an (open-source) textbook. LibGen’s administrators explain: “The target groups for LibGen are poors: Africa, India, Pakistan, Iran, Iraq, China, Russia and post-USSR, etc. . . . If you are not at a university, you can’t access anything.”

LibGen is a remedy to a great wrong: access to the riches of knowledge conditional on geolocation and wealth. Meta and Anthropic and likely OpenAI raiding shadow libraries for training data is why, as the old internet saying goes, we can’t have nice things.

Why are authors suing Meta and OpenAI but not shadow libraries? Most authors do not make their living from book sales. A common way to split the money made on book sales is for the author to get 10 percent and for the publisher to get the other 90 percent. Thus, LibGen’s greatest foes have not been author copyright holders but publishers, especially ones manoeuvred into a monopoly.

Shadow Libraries references a 2015 research paper reporting that “five companies—Elsevier, Springer, Wiley-Blackwell, Taylor and Francis, and Sage—published 50 percent of all research papers, rising as high as 70 percent in the social sciences.” In particular, Elsevier has heavily lobbied against open access, taken action against individual academics for posting their own articles online, and, in 2015, obtained injunctions against LibGen and against the neuroscience student who founded SciHub directly for, yes, copyright violation.

It’s babyish to blame the shadow libraries for the desolate conditions that drive them. Elsevier has a profit margin of nearly 40 percent, but, as The Conversation points out, its cash-cow journals are technically volunteer run. Academics provide content, and peer-review this content, for free. Elsevier takes a publicly funded resource, locks it behind paywalls, and charges extortionate prices: hundreds of millions per year for Canadian universities. The Guardian reports that for Manchester University’s library, a print book that costs £75, costs £975 as an ebook, with a limit of three users.

Retraction Watch, a site that monitors scientific integrity, reported that nearly the entire editorial board of an Elsevier journal resigned last year after accusing the company of using an LLM to edit papers.

In May, the Chicago Sun-Times published a summer reading list, recommending famous novelists like Percival Everett, Andy Weir, and Isabel Allende, and attributing book titles to them that did not exist. It turned out the list was AI generated. When an LLM resembles a frantic undergrad in the library banging out the 1,500 words due at 9 a.m., it’s easy to dismiss. ChatGPT-5, released this year and fan-fared as having “PhD-level intelligence,” could not reliably count the number of “b”’s in “blueberry,” the Guardian said. AI technology cannot write a novel because current LLMs’ context windows—roughly, the amount of attention they can pay at a given time—are not yet long enough to write a sensical long-form text. This doesn’t stop click-bait headlines from heralding that human authors are going extinct as AI systems “churn out a thousand ebooks a month.”

AI’s actual effect on working writers is more complex and opaque, but no less alarming.

Clio Books, announced at the 2024 Frankfurt Book Fair, is a publishing company that pairs writers with an AI coach to “speed up the book creation process.” An AI tool called My Poolitzer is marketed to literary agents to help read submissions faster. Another called StoryWise will do the same for acquiring editors, uncovering “previously unknown authors.” No one will admit to using these tools.

But consider: in the US, Publishers Weekly found there are 40 percent fewer jobs in publishing now than thirty years ago, and the number of titles has mushroomed since then, and while some of this is the effect of (non-AI) digitization, much of it is conglomeration, “achieved on the backs of staff: fewer full-time . . . [more] outsourced to freelancers foreign and domestic.” It feels impossible that none of this underpaid skeleton crew has resorted to AI. We know from the algorithm that bot logic consolidates trends, rewarding what’s recognizable over what has not been pathed. If robots take charge of acquisitions, they’ll finish the job for the algorithm. The preference for formulaic style—and values—will throttle aesthetic risk and cultural heterogeneity all in one.

Since June 2019, Google has pivoted to zero-click searches, meaning the answer sought is displayed at the top of the results, and the user does not need to click (and leave Google) to find out more. This kicked into overdrive once Google embedded its AI in 2024, rolling out the forced feature of generative summaries at the top of nearly every search.

Since then, Bain & Company found “nearly 60 percent of Google searches result in zero clicks.” Meaning: zero-click searches suction up the labour of media workers and content writers and slap it in Gemini’s skybox, without sending traffic to the information’s source, threatening to obsolete the outlets. (When I google “zero click searches,” the AI summary helpfully suggests: “Businesses can adapt to this trend by optimizing their content to appear in these zero-click result formats, focusing on answering questions directly and concisely.” Bad news for long reads.)

Back to the Chicago Sun-Times: Where did their AI-written book list even come from? 404 Media reported that though the list was part of a “best of summer guide,” a seasonal staple in media usually “loaded with local events calendars,” the one in the Chicago Sun-Times was unusually “generic.” There was no byline, but 404 Media figured out it was written by a freelancer, Marco Buscaglia, hired by King Features to write a guide broad enough to be inserted into any US newspaper.

The rec list was part of a sixty-four-page section, which Buscaglia authored almost solo—a significant chunk of the newspaper contracted out. The blog theloudpoet.com sleuthed that Buscaglia was once a career journalist: “Content strategy and content marketing are powered by hundreds (thousands?) of former journalists who were laid off from magazines and newspapers over the past twenty years . . . He was simply trying to continue making a living in an industry that has aggressively devalued his profession.”

The CEO of the Sun-Times stated, “Buscaglia won’t work for King Features again, nor will he work at Chicago Public Media.”

As AI researcher Adio Dinika told the Guardian, “AI isn’t magic; it’s a pyramid scheme of human labour.” Data annotators in offshore locations with cheap labour costs, like Kenya and Columbia, label toxic text to teach the AI what is off limits. They are exposed to hours of violent and graphic content, or, as one employer called it to Time magazine, “illegal categories.” AI raters rate and moderate the “extreme content” generated by an LLM, to make it “safe” for users, facing inhuman time and production targets in an AI arms race. Thousands of people globally do this job, invisibilized by their own success in making the chatbot appear effortlessly human.

The Royal Canadian Mounted Police has used AI tools like Clearview, Traffic Jam, Spotlight, and Rekognition for mass surveillance, even though Scientific American reports that facial recognition tech “cannot tell Black people apart.” The Canadian government has been “experimenting with the adoption” of predictive analytics in the evaluation of immigration applications since 2014. Like in a Tom Cruise movie, Canadian judges are considering AI to predict recidivism. Israel has used the occupation of Palestine as a lab in which to test its AI-powered spyware tools, which it sells to Canada; a Citizen Lab report from March found evidence to suggest that the Ontario Provincial Police may have used them. By 2027, AI servers are projected to use more fresh water than all of Denmark. Over 2 billion people globally lack safe drinking water, including thirty-seven Indigenous communities in Canada.

Copyright cases might provide authors, and potentially filmmakers, musicians, and YouTubers next, with settlement money made from machinery used in genocide and our state’s atrocities. They will not curb AI’s most grotesque consequences to our fellow labourers, our neighbours, and ourselves.

So what will?

A 2025 paper called “Abolish Privacy,” lead-authored by Elisha Lim (my sibling), objects to online privacy as unquestionably good. “Privacy is a commodified version of safety that manifests in many forms: the exclusionary ownership of Indigenous lands and enslaved people, liberal property entitlements, corporate privatization, platform securitization, denying of corporate responsibility, paywalls.”

Rather than advocating for better protections against data mining, as internet and privacy activists have done, Lim argues the opposite: the antidote to surveillance capitalism is for social media companies to give us our data to hold collectively. Just as health care is a public good in Canada, Lim says, “data should be the commons, for public use; useful for making statistical predictions for the entire country, something we can all use.” Like the census, but with the granularity of algorithmic data. But Lim points out a problem. Just as Meta and Anthropic went to LibGen, that brute force commons, “there’s nothing to stop corporations from sucking it all up.”

Rinaldo Walcott’s 2021 book On Property calls for the abolition of property itself, as property made possible the Atlantic slave trade and its counterpart, the carceral state. Walcott believes there’s a remedy for Lim’s prediction of Meta, Anthropic, et al. freeloading off public data, one that would solve the violations of AI and surveillance by making us “all responsible for the distribution of these resources.” It’s not just data that should be collectively owned—it’s the tech itself. Walcott conceives of social media and AI as public utilities, no different from the electricity grid, which should not be under private auspices.

“What we need are governments that have peoples’ interests at heart, that will simply nationalize Meta,” Walcott says. Cohere, one of the most prominent Canadian AI companies, has already received millions of dollars from the federal government. (Cohere is itself being sued by a coalition of fourteen news companies over data scraping, including The New Yorker, the Guardian, and the Toronto Star.) The next step is to ask for some control in return.

Does the art we produce belong in the commons too: priceless and collectively held? That’s how most of us understand art, even if our contracts inscribe an uglier reality. What if art was in the commons and artists were paid out of the commons? Perhaps this sounds outlandish. But we already have a model for state-compensated workers in Canada—doctors, for instance. And you can’t swing a cat in this country without hitting a publisher or a creator who relies on government funding in the form of arts grants. Rather than scraping by, from one grant to the next, if art was held in the public trust, artists could receive a guaranteed basic income in return—a steady payment to supplement their livelihood. Any system offering longer-term security from a public source—just as public school teachers should not be employed on a contractual basis—would protect artists from the ever more dystopian market.

If you’d like a more practical way to conceive of this, here’s one that’s fantastic or sickening, depending on where your interests lie. To the market, art is no longer valueless, of little industrial interest. For the first time in generations, art, as a totality, has tremendous value: as higher-quality inputs to make higher-quality, lucrative tech. It’s its own kind of rare mineral. Doesn’t it make sense for the government to invest?

Sean Michaels, who trained his own LLM with the help of a programmer to help write his 2023 novel Do You Remember Being Born?, emphatically concurs that government funding could mitigate some of the problems AI causes for writers. “What if we spent half the energy we were spending on getting $100 million from the AI companies on how we, as citizens, can get money from our government?”

When I published my last novel, I discovered no matter how well or poorly the book did, turning it into a commodity had made it into its enemy. As philosopher and social theorist Theodor Adorno says, once an object is put up for sale, it loses inherent value: “Everything has value only in so far as it can be exchanged, not in so far as it is something in itself.” Adorno was referring to pop music, or, as he called it, “rubbish.” Uncharacteristically, Adorno wasn’t being dire enough: what he says applies to all art, rubbish or otherwise.

Art exists just to exist. Outside of the market’s reality is art’s reality: no deliverables; the point of a painting is to look, of a book is to read; it’s a reprieve from our bloodthirsty world where, against nature, you must earn the right to exist. When I conceive of my art as only sales, something that is only once it gets its ©, there is despair to the roots of my teeth.

We know art transcends worldly categories. It’s why we keep making it, it’s why I became a novelist though the margins are just awful. What I make is yours, is how my art really feels to me. If only I could afford to give it away.

My Book Was Stolen by an AI Company. Why Does Suing Them Feel Wrong?

Leave a Comment Cancel reply

Technology

Samsung makes ads on $3,499 smart fridges official with upcoming software update

Science

A new, expansive view of the Milky Way reveals our galaxy in unprecedented radio color

Gaming

Amazon steps back from AAA development (and towards AI) amid massive round of job cuts – Destructoid

Entertainment

Injured in Game 3, Springer in too much pain to return

Technology

Red flag or red herring? Here’s how AI’s power, water and carbon footprints stack up on a global scale

Sports

Les détours ont été payants pour Shane Wright

Leave a Comment Cancel reply

most recent

Technology

Samsung makes ads on $3,499 smart fridges official with upcoming software update

Science

A new, expansive view of the Milky Way reveals our galaxy in unprecedented radio color

Gaming

Amazon steps back from AAA development (and towards AI) amid massive round of job cuts – Destructoid

Entertainment

Injured in Game 3, Springer in too much pain to return

Technology

Red flag or red herring? Here’s how AI’s power, water and carbon footprints stack up on a global scale

Sports

Les détours ont été payants pour Shane Wright