Book publishing. And everything else.

Archive for the tag “Bowker”

A thought on identifiers and books

In mid-March of 2006, NISO convened a roundtable of experts and thought leaders in digital resources, at the National Library of Medicine in Bethesda, Maryland. The goal of this meeting was to establish some consensus around the use of identifiers for text, video, music, and other media in the digital realm. In breakout discussions, three characteristics of an identifier were ultimately defined: granularity, semantic opacity, and persistence.

The granularity of an identifier refers to precisely what it identifies. An ISBN, for example, identifies a stand-alone, trade-able publication (a book or a chapter). It does not identify an illustration, a diagram, a bibliography. The publication is the extent of the ISBN’s granularity. Other identifiers (such as the DOI) can identify components of publications.

Semantic opacity refers to the degree to which the identifier is a “dumb number” – a random string of numbers that carries no intelligence. The ISBN is only partly a dumb number – it begins with 978 or 979, which indicate that the thing being identified is in the book supply chain; it then has a publisher prefix. The string following the publisher prefix is semantically opaque, and the ISBN ends in a check-digit that validates the number.

Persistence refers to how long the relationship between the identifier and the object will last. Identifiers on shipping containers, for example, do not need to be persistent after the container has been unloaded and its contents dispersed. Identifiers on books need to be persistent for a much longer period of time, as information about a book can be created long after the book itself has gone out of distribution.

Essentially, all an identifier does is say, “This thing is not that thing.” It doesn’t say what the thing is, or offer any insight about any of the thing’s characteristics. An identifier expresses uniqueness. And that’s all it expresses.

Search is not complicated; it’s just hard

If my years in and around relational databases have taught me anything, it’s that a search can only retrieve what the search engine is looking for.

Marked-up repositories (such as the entirety of what Google indexes and provides search results on), while very different from relational databases in many important ways, are not so different here.

Search engines search for what they have been told (in advance) will be there. They don’t search for what’s missing. (A feature we could use, quite frankly, but I wouldn’t begin to know how to architect it.) When the web was small, search engines just searched the contents of the web pages that were out there, and built algorithms to help with relevancy and context (Mercury the plant vs. mercury the element, for example). As the web grew, it became clear that the authors of web pages had to tell search engines what was in those pages – because searching the raw data was proving to be too time-consuming, and was generating inconsistent results (not good if you are trying to sell advertising alongside those results). Keywords for websites would help bolster those relevancy and context algorithms, adding weight to certain content – and if keywords weren’t used, those pages would find themselves at the end of a very long list of pages. Web developers used to do this by using the HTML “meta” tag.

There are two problems with this approach.

The first problem is that developers sometimes lie. These developers insert “meta” tags with keywords that have nothing to do with the topic of the web page, in the hopes of attracting traffic. This is why search engines have, for some time, ignored the “meta” tags. They are polluted with bad information.

The second problem is that there are billions of web pages. Any structured way for web developers to tell search engines what’s in their websites has got to scale to a nearly infinite degree. This means that it has to be easy, and it has to be easy to retroactively apply to already-existing web pages.

But if it’s too easy, then there is a risk of data pollution.

It’s a tense race, basically, between volume and honesty.

And it’s only going to get more tense as more websites are created, and more print resources are digitized and opened for search (this includes books). As many billions of web pages as there are now, there will be exponentially billions more. Possibly a googolplex.

Most of which are authored by people with good intentions who want to get their information found and contextualized; many of which are authored by people who really really want their web pages to show up in the top 3 or 4 for a given search term; some of which are authored by people of dubious character and even more dubious motivation, who assign tags like “Britney” to websites about onion farming.

The way the web reflects the noise and bumptiousness of  human nature never ceases to amaze me. Whatever structure we invent to organize our communications (and humans are ridiculous communicators), it will be sabotaged. But that structure is, nevertheless, what we have. Without it, we are even worse off.

Searching for Emery Koltay

I was fortunate to be in the UK for the FutureBook 2012 conference, followed by an International DOI Foundation meeting in Oxford. While having dinner with Stella Griffiths, the Executive Director of ISBN International, and Beat Barblan, the Director of Identifier Services for Bowker (and my boss) we talked a little bit about the early days of EDI and commerce-oriented book numbering systems.

Stella brought up Emery Koltay, whom neither Beat nor I had heard of. But apparently he and David Whitaker (presumably one of the sons of “J. Whitaker & Sons,” publisher of British Books in Print as well as Whitaker’s Almanack) developed what became the ISBN. J. Whitaker & Sons eventually merged with several other companies to form BookData, and was ultimately acquired by Nielsen. Emery Koltay…worked at Bowker and eventually headed up the ISBN Agency in the US.

Which – well, apparently Emery Koltay had had enough adventures in his lifetime so that settling down to a career of what amounts to arithmancy and ancient runes was a welcome relief. This is his obituary (originally sent by Stella, and which I later found online):

Emery I. Koltay of Eastchester, NY passed away on August 23, 2012, after a long illness. He was born in the Transylvania region of Romania December 22, 1921. During WWII he escaped from several Hungarian work camps and survived the war in Budapest hiding under an assumed name. After the war he returned to Romania where he completed his education and started a family. In 1958 he was arrested by the Securitate, secret police, and spent four years in a communist prison for aiding the escape of Jews from the regime. In 1963 he emigrated with the family to the U.S. where he established himself as editor and publisher of reference books. He also took a lead in working with the library of Congress, developing and implementing the international book numbering system for U.S. publishing. Klara, his wife of sixty one years, died in 2007. He is survived by two children, four grandchildren and three great grandchildren.

Yeah, my jaw dropped too. Six years after getting out of work camps, hiding, oppression, and communist prison, he introduced ISBNs into the US book supply chain.

Apparently he continued working at Bowker (even after “retirement”) until 1996. I wish I had known him.

Today’s #ISBNhour transcript

Can be found here, thanks to Porter Anderson!


Rhizomes and Disruption

A post by Brian O’Leary this morning led me to think again of Deleuze and his rhizomes. Brian quotes Greil Marcus on Elvis:

He knew what he was doing. If he redefined what it was to be American, it was because he meant to. He wanted change. He wanted to confuse, to disrupt, to tear it up.

Last spring, in our frenzied landscaping, Bernardo and I broke up and re-planted a bunch of hostas and lilies of the valley. The roots of both of these plants are rhizomes - if you cut off a piece of the root, a new plant will grow from that piece. Rhizomes are cool because you can disrupt a plot and create a new one – molding the plant’s growth according to the changing demands of your garden.

The effects of Books in Browsers 2012 are still with me. Peter Brantley, the organizer of the conference, describes our collective epiphany this way in Publishers Weekly:

What we witnessed, to cite John Maxwell from Simon Fraser University, was a transcendence of contemporary publishing. BiB speakers were not trying to repair or modernize publishing. Rather, they were designing new solutions for a world in which story-telling takes advantage of networked tools for sharing insights and art. Such solutions may well lead many existing publishers into new and exciting places; on the other hand, they may not. BiB did not speak to it: indeed, nothing at BiB served to “obsolete” or replace publishing. But it is clear that we are on the threshold of an explosion of new services, spreading across many niches of story-telling that never before were beneficiaries of Internet technologies.

As Maxwell notes, we are watching a divesture of literature from the act of publishing as we have conceived it for the last 150 years. It was that insight of BiB that chilled everyone in the Sanctuary to the bone that Friday morning. How we publish – how we tell stories – is increasingly liberated from the formerly necessary contributions of companies like Random House and Penguin, Hachette and Simon & Schuster. Simply put, these firms are no longer necessary for the creation of literature. They may present significant advantages in marketing, production, and for years yet, in the distribution of print. Yet, as authors gather the spirits within themselves to create, they will increasingly draw up a panoply of online tools and services that could not care a whit for all that made publishing possible in the past.

In Deleuzian terms, those of us who were so moved at this conference are diverting the flow of publishing, creating something altogether new. Taking the rhizomes of what we know – storytelling, expression, documentation, encoding, promulgation – and breaking them up, replanting them in a different way. Changing the landscape one root system at a time.

And with intent. Deliberately, intentionally, tearing it up. In my BiB presentation, I talked about evolution. But what I felt at BiB was in fact a disruption. And it left me, and a lot of other people, all shook up.


We’re Not Finished

On the heels of the Apple iPad Mini announcement, I’m thinking about cathedrals.

Cathedrals, the lore goes, are never finished. This is not directly attributable to anyone in particular, and a web search brings up very little on this topic, but it’s a notion that many Catholics are reared on: as we strive towards perfection, towards completion, so do our houses of worship. We are never perfected, never completed; neither are our cathedrals.

There will always be another gadget. We invent things, we build things, and we do this at an ever-increasing pace. The nature of publishing now is more change than stability. There is no “final product” – only approaches to finality.

This is quite difficult for folks who think of books (and book publishing) as having permanence. Who look at something like Google’s execution on book metadata (or Apple’s execution on music metadata) and see a final product, not something in flux. The joy of digitization is that things can change. We can perpetually improve them. Nothing is final.

Becoming comfortable with this can be difficult. And each new gadget, each new format, each new iteration, seems an end in itself – one that is insufficient and lacking. Yes, we spend money on the insufficient – the iPad Mini is $329, which is kind of a lot of money for something that isn’t perfect (and which may be replaced by a newer model in 6 months).

But many consumers – and readers! – do understand that we are all on this road together, consumers and developers. And that it is a road toward progress. We’re never going to get there – wherever “there” even IS – but we’re going to approach transcendance from time to time (with linked data, with open data, with tags and identifiers and retina displays).

And even though cathedrals are never finished, they are sure incredibly beautiful.


Post Navigation


Get every new post delivered to your Inbox.

Join 3,397 other followers