A matter of scale
I keep citing this massive growth that Books in Print has experienced over the last 14 years. As insane as that is, it’s worth noting also that there are many, many books which never get listed in Books in Print – Kindle originals, for example, or self-published books. Or documentation that’s written for a specific purpose but which becomes useful to many people. That sort of thing almost never gets an ISBN.
It’s also worth noting that this is just in the US. It’s not a question of whether or not publication rates worldwide have experienced exponential growth; it’s merely a question of by what factor. We could be looking at nearly a billion books.
At some point in the near future, most of these billion books will be available digitally.
This is a data problem like none we’ve ever encountered before. So very very much content – more than currently exists on the web – gradually and then suddenly adding to what’s on the web. An ebook, as Hugh McGuire repeatedly reminds us, is a website. It’s a website in a box (or, rather, zip file), which will eventually leave its box and be available – in whole or in parts – on the web. This is not going to happen 20 years from now. This has actually been happening for years, but behind paywalls. A new consumer-facing product, Udini, is essentially a search engine sitting on top of all of ProQuest’s journal databases, and independent researchers pay by the article – the only thing stopping ProQuest from folding in its book databases (such as UMI) is the work of integration. Not insignificant, but the world is moving in that direction. Safari Books is a great example of what’s possible.
When search engines (Google, Bing, Yahoo) treat books like websites, crawling and indexing them, metadata will assume an even more critical role than it already does. Even more than metadata, the structure of that metadata – how it’s communicated, what gets communicated, how it’s organized – will be critical.
We think of books as discrete products that can be borrowed/lent, bought/sold. Increasingly, we’re going to have to move beyond that kind of thinking, beyond the question “What is a book?” (The answer, by the way, is, “Who cares?”)
So our current taxonomies, which treat books as things, will be overwhelmed by the un-thing-ness of books on the web. Developing a structure for these taxonomies – and vocabularies for these taxonomies – is going to be painstaking. And possibly tedious!
But we will have to do it. What we use now will soon strain at the seams.