Hard books are on their way to extinction.
Biologists maintain a concept call a “type specimen.” Every species of living organism has many individuals of noticeable variety. There are millions of Robins in America, for instance, all of them each express the Robin-ness found in the type of bird we have named Turdus migratorius. But if we need to scientifically describe another bird as being “like a Robin” or maybe “just a Robin” which of those millions of Robins should we compare it to?
Biologists solve this problem by arbitrarily designating one found individual to be representative and archetypical of the entire species. It is the archetype, or the “type specimen,” of that form. There is nothing special about that chosen specimen; in fact that’s the whole idea: it should be typical. But once chosen this average specimen becomes the canonical example that is used to compare other forms. Every species in botany and zoology has a physical type specimen preserved in a museum somewhere.
Books and other media creations are now getting their type specimen archive. The same guy who has been backing up the internet (yes the entire web!), and is racing Google to scan all books into digital files, has recently become concerned about the lack of a physical archive for all these digitized books. That guy is Brewster Kahle, the founder of the Internet Archive. Brewster noticed that Google and Amazon and other countries scanning books would cut non-rare books open to scan them, or toss them out after scanning. He felt this destruction was dangerous for the culture.
We are in a special moment that will not last beyond the end of this century: Paper books are plentiful. They are cheap and everywhere, from airports to drug stores to libraries to bookstores to the shelves of millions of homes. There has never been a better time to be a lover of paper books. But very rapidly the production of paper books will essentially cease, and the collections in homes will dwindle, and even local libraries will not be supported to house books — particularly popular titles. Rare books will collect in a few rare book libraries, and for the most part common paper books archives will become uncommon. It seems hard to believe now, but within a few generations, seeing a actual paper book will be as rare for most people as seeing an actual lion.
Brewster decided that he should keep a copy of every book they scan so that somewhere in the world there was at least one physical copy to represent the millions of digital copies. That safeguarded random book would become the type specimen of that work. If anyone ever wondered if the digital book’s text had become corrupted or altered, they could refer back to the physical type that was archived somewhere safe.
But where? The immediate answer is: in cardboard boxes, stacked five high on a pallet wrapped in plastic, stored 40,000 strong in a shipping container, inside a metal warehouse on a dead-end industrial street near the railroad tracks in Richmond California. In this nondescript and “nothing valuable here” building, Brewster hopes to house 10 million books — about the contents of a world-class university library. The containers are stacked two high and are plumbed to remain at 30% humidity. Together with their triple waterproofing (plastic, steel container, steel roof), they will remain dry even in short periods of neglect.
But he is archiving more than just the paper books. Even digital versions are physical in some way. So the Internet Archive is also storing in these interior shipping containers the tapes of the previous versions of digital scans, and the hard discs of today’s scans, leaving room for the physical form of whatever media platform is next. There will be a next, Brewster says: “When they were making microfilm of books, they thought they would never have to rescan them. When they were being scanned at 300 dpi, they thought they would never have to scan them again. We know someday these books will be rescanned. They will be waiting here in boxes.”
The big idea that EVERY digital form ultimately rests in a physical form is a deep truth that needs to be understood more widely. From Brewster’s summary of the project:
As the Internet Archive has digitized collections and placed them on our computer disks, we have found that the digital versions have more and more in common with physical versions. The computer hard disks, while holding digital data, are still physical objects. As such we archive them as they retire after their 3-5 year lifetime. Similarly, we also archive microfilm, which was a previous generation’s access format. So hard drives are just another physical format that stores information. This connection showed us that physical archiving is still an important function in a digital era.
The books are not meant to be retrieved one by one, but as a collection, by the pallet full, say. But they are stored with the idea that they will be needed eventually. The specs of this multilayered system:
Books are cataloged, and have acid free paper inserts with information about the book and its location. Boxes store approximately 40 books with labeling on the outside. Pallets hold 24 boxes each. Modified 40â€² shipping containers are used as secure and individually controllable environments of 50 or 60 degrees Fahrenheit and 30% relative humidity. Buildings contain shipping containers and environmental systems. Non-profit organizations own and protect the property and its contents. Buildings contain shipping containers and environmental systems.
This past Sunday this long-term archive for paper books was opened to visitors. The current capacity is about half a million books. Many of the books were bought for almost nothing on the used book market, and others were collections of books donated by book lovers. The Archive is looking for more collections to scan and store. It costs about ten cents per page to track, catalog and scan a book. One advantage owning the books they scan is that it gives them a small edge in claiming the right of fair use for the digital copy they make. They try to have scans of only books they own.
A prudent society keeps at least one specimen of all it makes, forever. It still amazes me that after 20 years the only publicly available back up of the internet is the privately funded Internet Archive. The only broad archive of television and radio broadcasts is the same organization. They are now backing up the backups of books. Someday we’l realize the precocious wisdom of it all and Brewster Kahle will be seen as a hero.