I mentioned a while ago that in 1998, there were roughly 900,000 active titles listed in Books in Print. And today there are 32 million.
I keep coming back to this massive explosion of books because it’s a great benchmark for the information industry at large. If we’re talking about merely books, that exponential growth in the space of 14 years is amazing enough – but now apply that to other industries. Medicine. Technology (bio- and other). Entertainment. Government (large and small). The information that’s available on the open web in 2012 (as opposed to 1998) is far more detailed, plentiful, and even redundant. And that’s just the open web. Information behind firewalls, paywalls, and other walls is even more detailed and plentiful (and redundant).
So whatever problems we were having in organizing that information in 1998 – we’re still having them.
It’s not just books, is the thing. The lessons we’ve learned in the last 14 years of book publishing/selling can be applied readily to the web as a whole. And here’s what we can teach newcomers to this world of exploding information:
- The “browse vs search” debate will never die. Do users prefer uniform, controlled vocabulary vs keyword searching that may or may not be a crapshoot? I go back and forth on this issue – with controlled vocabularies, you always have to ask “controlled by whom?”, but on the other hand a keyword search often turns up a mishmash of results that you really have to sift through.
- The “fix your metadata” issue has also not died. In fact, it’s more important than it was in 1998, because there’s just so much more data to wade through and having clean metadata – pointers and categories and consistent fields – is pretty much the only way anybody’s going to find anything.
- It’s a lot of work to get the metadata right. A lot of boring work, and that’s the only way it’s going to get done. Hours upon repetitive hours of mapping, linking, joining – with frequent palliative applications of chocolate, cheese and strong tea.
- Most people take search for granted, and take search results on a faith that they don’t question very hard. They will never know how much work is involved in making sure that they get what they are looking for. Convincing them of the importance of the painstaking and boring work is Quixotic at best. Don’t convince. Just do. Map, rank, link, query, clean. And then do it again.
- The persistence (and continued refinement) of ONIX and MARC indicate that if the Semantic Web/Linked Data didn’t exist, we would have to invent it. It’s common sense – the constant exchange of large amounts of data begs for structure. Without structure, things get lost.
- When there is structure, there is deliberate corruption. Because people always try to game the system. Just because there’s corruption doesn’t mean the structure shouldn’t get built. Understand that everybody has an agenda and move on.
- Standards take time. We don’t have time. This is an eternal tension. Just keep swimming.