Paying AIs to Read My Books
Some authors have it backwards. They believe that AI companies should pay them for training AIs on their books. But I predict in a very short while, authors will be paying AI companies to ensure that their books are included in the education and training of AIs. The authors (and their publishers) will pay in order to have influence on the answers and services the AIs provide. If your work is not known and appreciated by the AIs, it will be essentially unknown.
Recently, the AI firm Anthropic agreed to pay book authors a collective $1.5 billion as a penalty for making an illegal copy of their books. Anthropic had been sued by some authors for using a shadow library of 500,000 books that contained digital versions of their books, all collected by renegade librarians with the dream of making all books available to all people. Anthropic had downloaded a copy of this outlaw library in anticipation of using it to train their LLMs, but according to court documents, they did not end up using those books for training the AI models they released. Even if Anthropic did not use this particular library, they used something similar, and so have all the other commercial frontier LLMs.
However the judge penalized them for making an unauthorized copy of the copyrighted books, whether or not they used them, and the authors of all the copied books were awarded $3,000 per book in the library.
The court administrators in this case, called Bartz et al v. Anthropic, have released a searchable list of the affected books on a dedicated website. Anyone can search the database to see if a particular book or author is included in this pirate library, and of course, whether they are due compensation. My experience with class action suites like this is that very rarely does award money ever reach people on the street. Most of the fees are consumed by the lawyers of all sides. I notice that in this case, only half of the amount paid per book is destined to actually go to the author. The other 50% goes to the publishers. Maybe. And if it is a text book, good luck with getting anything.
I am an author so I checked the Anthropic case list. I found four out of my five books published in New York included in this library. I feel honored to be included in a group of books that can train AIs that I now use everyday. I feel flattered that my ideas might be able to reach millions of people through the chain of thought of LLMs. I can imagine some authors feeling disappointed that their work was not included in this library.
However, Anthropic claims it did not use this particular library for training their AIs. They may have used other libraries and those libraries may or may not have been “legal” in the sense of having been paid for. The legality of using digitized books for anything is still in dispute. For example, Google digitizes books for search purposes, but only shows small snippets of the book as the result. Can they use the same digital copy they have already made for training AI purposes? The verdict in the Bartz v. Anthropic case was that, yes, using a copy of a book for training AI is fair use, if it was obtained in a fair way. Anthropic was penalized not for training AI on books, but for having in its possession a copy of the books it had not paid for.
This is just the first test case of what promises to be many more tests in the future as it is clear that copyright law is not adequate to cover this new use of text. Protecting copies of text – which is what copyright provisions do – is not really pertinent to learning and training. AIs don’t need to keep a copy; they just have to read it once. Copies are immaterial. We probably need other types of rights and licenses for intellectual property, such as a Right of Reference, or something like that. But the rights issue is only a distraction from the main event, which is the rise of a new audience: the AIs.
Slowly, we’ll accumulate some best practices in regards to what is used to train and school AIs. The curation of the material used to educate the AI agents giving us answers will become a major factor in deciding whether we use and rely on them. There will be a minority of customers who want the AIs to be trained with material that aligns with their political bent. Devout conservatives might want a conservatively trained AI; it will give answers to controversial questions in the manner they like. Devout liberals will want one trained with a liberal education. The majority of people won’t care; they just want the “best” answer or the most reliable service. We do know that AIs reflect what they were trained on, and that they can be “fine tuned” with human intervention to produce answers and services that please their users. There is a lot of research in reinforcing their behavior and steering their thinking.
Half a million books sounds like a lot of books to learn from, but there are millions and millions of books in the world already that the AIs have not read because their copyright status is unclear or inconvenient, or they are written in lesser-used languages. AI training is nowhere near done. Shaping this corpus of possible influences will become a science and art in itself. Someday AIs will have really read all that humans have written. Having only 500,000 books forming your knowledge base will soon be seen as quaint, but it also suggests how impactful it can be to be included in that small selection, and that makes inclusion a prime reason why authors will want their works to be trained on AIs now.
The young and the earliest adopters of AI have it set to always-on mode; more and more of their intangible life goes through the AI, and no further. As the AI models become more and more reliable, the young are accepting the conclusions of the AI. I find something similar in my own life. I long ago stopped questioning a calculator, then stopped questioning Google, and now find that most answers from current AIs are pretty reliable. The AIs are becoming the arbiters of truth.
AI agents are used not just to give answers but to find things, to understand things, to suggest things. If the AIs do not know about it, it is equivalent to it not existing. It will become very hard for authors who opt out of AI training to make a dent. There are authors and creators today who do not have any digital presence at all; you cannot find them online; their work is not listed anywhere. They are rare and a minority. As Tim O’Reilly likes to say, the challenge today for most creators is not piracy (illegal copies) but obscurity. I will add, the challenge for creators in the future will not be imitation (AI copy) but obscurity.
If AIs become the arbiters of truth, and if what they trained on matters, then I want my ideas and creative work to be paramount in what they see. I would very much like my books to be the textbooks for AI. What author would not? I would. I want my influence to extend to the billions of people coming to the AIs everyday, and I might even be willing to pay for that, or to at least do what I can to facilitate the ingestion of my work into the AI minds.

Another way to think of this is that in this emerging landscape, the audience for books – especially non-fiction books – has shifted away from people towards AI. If you are writing a book today, you want to keep in mind that you are primarily writing it for AIs. They are the ones who are going to read it the most carefully. They are going to read every page word by word, and all the footnotes, and all the endnotes, and the bibliography, and the afterward. They will also read all your books and listen to all your podcasts. You are unlikely to have any human reader read it as thoroughly as the AIs will. After absorbing it, the AIs will do that magical thing of incorporating your text into all the other text they have read, of situating it, of placing it among all the other knowledge of the world – in a way no human reader can do.
Part of the success of being incorporated by AIs is how well the material is presented for them. If a book can be more easily parsed by an AI, its influence will be greater. Therefore many books will be written and formatted with an eye on their main audience. Writing for AIs will become a skill like any other, and something you can get better at. Authors could actively seek to optimize their work for AI ingestion, perhaps even collaborating with AI companies to ensure their content is properly understood, and integrated. The concept of “AI-friendly” writing, with clear structures, explicit arguments, and well-defined concepts, will gain prominence, and of course will be assisted by AI.
Every book, song, play, movie we create is added to our culture. Libraries are special among human inventions. They tend to get better the older they get. They accumulate wisdom and knowledge. The internet is similar in this way, in that it keeps accumulating material and has never crashed, or had to restart, since it began. AIs are very likely similar to these exotropic systems, accumulating endlessly without interruption. We don’t know for sure, but they are liable to keep growing for decades if not longer. At the moment their growth seems open ended. What they learn today, they will probably continue to know, and their impact today will have compounding influence in the decades to come. Influencing AIs is among the highest leverage activities available to any human being today, and the earlier you start, the more potent.
The value of an author’s work will not just be in how well it sells among humans, but how deep it has been included within the foundational knowledge of these intelligent memory-based systems. That potency will be what is boasted about. That will be an author’s legacy.