This week I received two "author" notices from the Google Book Settlement office. One in the mailbox and one by email. The alerts were sent to book authors as a result of a dispute between Google and the Author Guild (I am not a member) over whether Google had the right to scan out of print books and post snippets of them online. The tussle was settled out of court, and the agreement is very complex. As far as I can tell Google caved in to unfair reductions to fair use (bad for the commons), but they gained some good things for Google -- and the public readers. In short it was compromise. Should the settlement be approved, it will mean that the vast library of out of print books that Google has been busy digitizing all along will finally reach the web. Hooray!
While I was registering my own authored books per the notices in the mail, I noticed that this agreement acknowledges the coming One Universal Book -- the "book" that consists of all book texts, hyperlinked to each other. The clerks call it the "Research Corpus." This would be the aggregate copy of all book texts, which Google wants to use for "research" purposes. Note that they are not offering this mega-copy to everyone, just to "qualified users." The existence and use of this aggregate copy of all books was the basis of the suite in the first place (the Authors Guild objected to Google having a copy of books without their permission), so Google won a little in now being able to use it for research. In the goodness of time, the universal book should be open to all.
For now, according to the FAQ for the Google Book Settlement:
The Research Corpus will be made available to "Qualified Users" solely for engaging in specific types of research, including:
* Computational analysis of the digitized images to either improve the image or extracting textual or structural information from the image;
* Extracting information to understand or develop relationships among or within Books;
* Linguistic analysis, to better understand language, linguistic use, semantics and syntax as they evolve over time and across genres of Books;
* Automated translation (without actually producing translations of Books for display purposes); and
*Developing new indexing and search techniques.
The research corpus, or the GoogleBook, will edge us toward the universal library. I sketched out one vision of that library in Scan This Book:
"When books are digitized, reading becomes a community activity. Bookmarks can be shared with fellow readers. Marginalia can be broadcast. Bibliographies swapped. You might get an alert that your friend Carl has annotated a favorite book of yours. A moment later, his links are yours. In a curious way, the universal library becomes one very, very, very large single text: the world's only book.
Once a book has been integrated into the new expanded library by means of this linking, its text will no longer be separate from the text in other books. For instance, today a serious nonfiction book will usually have a bibliography and some kind of footnotes. When books are deeply linked, you'll be able to click on the title in any bibliography or any footnote and find the actual book referred to in the footnote. The books referenced in that book's bibliography will themselves be available, and so you can hop through the library in the same way we hop through Web links, traveling from footnote to footnote to footnote until you reach the bottom of things.
So what happens when all the books in the world become a single liquid fabric of interconnected words and ideas? Four things: First, works on the margins of popularity will find a small audience larger than the near-zero audience they usually have now. Far out in the "long tail" of the distribution curve — that extended place of low-to-no sales where most of the books in the world live — digital interlinking will lift the readership of almost any title, no matter how esoteric. Second, the universal library will deepen our grasp of history, as every original document in the course of civilization is scanned and cross-linked. Third, the universal library of all books will cultivate a new sense of authority. If you can truly incorporate all texts — past and present, multilingual — on a particular subject, then you can have a clearer sense of what we as a civilization, a species, do know and don't know. The white spaces of our collective ignorance are highlighted, while the golden peaks of our knowledge are drawn with completeness. This degree of authority is only rarely achieved in scholarship today, but it will become routine.
Finally, the full, complete universal library of all works becomes more than just a better Ask Jeeves. Search on the Web becomes a new infrastructure for entirely new functions and services. Right now, if you mash up Google Maps and Monster.com, you get maps of where jobs are located by salary. In the same way, it is easy to see that in the great library, everything that has ever been written about, for example, Trafalgar Square in London could be present on that spot via a screen. In the same way, every object, event or location on earth would "know" everything that has ever been written about it in any book, in any language, at any time. From this deep structuring of knowledge comes a new culture of interaction and participation."