Book publishing. And everything else.

Archive for the month “May, 2012”

Let the Identifier Identify; Let the Metadata Describe

I have a sign above my desk:

Yesterday, I mentioned that in 1998, there were 900,000 books in print. In 2012 – fourteen years later – there are over 32 million.

This is a massive disruption. Not only are the established publishing houses churning out more titles year after year, there are lots of new companies starting up, or authors self-publishing. There are a lot of new entrants into the market. And things that veterans have long taken for granted – the ISBN, for example – are being called into question by these newcomers.

It constantly amazes me that this number can wreak so much havoc, but for the last several years I’ve managed to devote an hour a week, virtually every week (except for a several-month hiatus due to work), on Twitter to troubleshooting and mythbusting around the ISBN. Yes…#ISBNhour has been going on for years. I can’t believe it myself. But somewhere, to someone, the principles of this standard (which has been in existence since the 1970s!) is always still news.

And that’s just a single identifier! Granted, it’s the most basic, most fundamental number in our business – without this number, there generally isn’t much in the way of sales – but there are others! But the ISBN looms so large – and teaching people how to use it is so critical – that the other identifiers get short shrift.

I’ll hype those other numbers in a different post. But what I’ve seen over the years is that people tend to confused identifiers with metadata – and this is the primary trouble around the ISBN. And I want to untangle that – in a way that everyone can refer back to.

George Wright III, who runs a company called PiPS, is a member of BISG. And I’ll never forget one meeting where we were discussing ISBNs and ebooks, and the possibility of appending a suffix onto the ISBN to distinctly identify ebooks (yes, we already thought of that). George promptly interjected – in his inimitable gravely voice – “Let the identifier identify; let the metadata describe.”

In other words, the job of an identifier is to distinguish one thing from the next. “This thing is not that thing.” That is all an identifier does. It’s really so simple it’s almost unbelievable. But think about your social security number. It just tells the government you are not any one of the other 300 million people in the country. Or your driver’s license number – which tells the state that you are 158 256 789 and not 159 233 467. That’s all it tells anyone. The rest – your name, your address, your date of birth – is metadata. But none of that metadata is embedded in the driver’s license number. It’s just a number.

But sometimes, with identifiers, we ascribe meaning to them; we interpret them. Area codes are a good example – because these, too, are in the midst of great disruption now. In 1999, there were so many phone numbers in Manhattan that a new area code needed to be established. The borough was in an uproar – 212 was “prestigious” and meant the real Manhattan (and hence the real New York), but 646 was an upstart.

Now there are so many area codes all across the country that we’re constantly looking them up to find out where people are calling from. And even that doesn’t tell us everything – I have many acquaintances with cell phones from one part of the country but who have moved elsewhere and taken their phone numbers with them. (I myself have had the same cell phone number since 1998.) So our phone numbers don’t necessarily have anything to do with location anymore. Phone numbers are rapidly becoming dumb numbers – a string of digits that carries no intrinsic meaning. But in your contacts file – a database, in other words – that phone number is unique. It’s distinct. It allows you to build a record around it – where you can put the metadata about the person: name, email address, etc. The identifier identifies – it sets one thing apart from another; the metadata describes.

Now let’s go back to ISBNs. Those of us who’ve been in the business for decades have come to see ISBNs as “smart” numbers. There’s a prefix – 978 or 979 – which designates the product as being in the book supply chain. There’s another prefix – of varying length – that designates the publisher. There’s the identifier of the book itself, which is supposed to be a dumb number. And there’s a check digit, which is the result of a formula that ensures that the entire number is valid.

Here’s where we get in trouble: the publisher prefix. That bit, which comes after the 978 or 979, ultimately comes to be regarded as sort of a vanity license plate for a publisher. Just as desirable Manhattan phone numbers began with 212, so desirable ISBN prefixes began with 0385 (Doubleday).

But what happens when Random House buys Doubleday and eventually puts it out of business? What happens to all of Doubleday’s books – do they all now get Random House ISBNs? What happens to the backlog of unassigned ISBNs at Doubleday – do they evaporate?


Doubleday’s books – so long as they remain in print – continue with their existing ISBNs. And Doubleday’s outstanding ISBN pool – those that haven’t already been assigned to books – get merged with Random House’s. So, in essence, Random House has several publisher prefixes. You can’t tell one from another. And the more companies that Random House buys, the more prefixes it has available to use. If it sells off a division, those ISBNs become property of the purchasing publisher. In an age of 32 million ISBNs, and over half a million prefixes, the ISBN can no longer “mean” anything, any more than an area code does.

Which brings us to…the eISBN.

Just as a publisher prefix cannot “mean” anything anymore, the ISBN is not meant to describe the format of a book. Again, that’s the job of the metadata. The ISBN identifies any trade-able product in the book supply chain. The ISBN only says “this thing is not that thing”. The metadata describes what it is, what format it comes in, how long it is, how much it costs, and everything else.

Calendars are sold in the book supply chain. Calendars get ISBNs. They don’t get cISBNs.

There is no such thing as an eISBN. Ebooks get ISBNs. And those ISBNs mean nothing in themselves, except that this ebook is not that ebook. The metadata – which includes the format – describes what kind of book it is. Attempting to divine meaning from the ISBN as it applies to ebooks is only marginally more reliable than divining your future from the lines of your palm.

There are vendors who ask publishers for eISBNs. Don’t be confused. There is no such thing. They are asking for the ISBNs of your ebooks. (And those vendors should know better, and we are talking.) There are periodicals that publish reviews with eISBNs. Again, there is no such thing. They are publishing the ISBNs of the ebooks. (And these periodicals should know better, and we are talking.)

When Books in Print registers information about ebooks, it doesn’t discriminate. An ISBN is an ISBN is an ISBN – whether it belongs to an ebook, a print book, or a calendar.

And if you can’t stop saying “eISBN” for yourself, do it for the kittens.

This Is A Thing

I bet you don’t have one of these. IT BOBBLES.

The Joy of Book Databases

One of the things I love most is digging into a huge book database and finding things out. How many books in how many categories published by which companies have page counts of less than x? This tells you things (like who’s publishing sub-book content – or chunks, as I like to call them – and for which markets).

Or which US publishers publish books in Spanish, and how many.

A data point is not an end in itself – you have to stack it up against other data points, anecdotal evidence, do some reality-testing. But having a large book database – as I did at B&N, as I did at Muze/Rovi – is a critical tool in making sense of publishing. You can see what the industry is doing on a large scale. It’s pretty extraordinary.

Having had back-end exposure to so many large databases, it’s interesting to compare their structure. Obviously I can’t say much, but it’s always pleasurable to have your suspicions validated about how data gets structured from one company to the next.

When you’re a product manager for bookish products, having access to such data puts you pretty much in heaven. Want to find out how to express the value prop of a particular tool? Run a query on the potential market for that tool. Want to find out the adoption rate of a particular standard? You can measure it.

(I can say that there are over 32 million books in print in the US. I can also say that in 1998, there were 900,000. Beyond that, you’ll have to subscribe to Bowker’s PubTrack reports.)

And data is critical for product development. Who’s consuming your products? How are they using them? I’m lucky – Books in Print tracks my products’ consumption very closely – DOIs, ISTCs, and very soon ISNIs. I have a pretty accurate view of who’s using these things, and for which products. Not all product managers have access to this kind of information.

So I’m grateful. And happy.

Some Data Observations

1. If I hear the term “eISBN” one more time, so help me God, I’m stopping the car and you can all walk home.

2. BISAC codes do so have a use. If you want to slice up the market broadly for a particular outreach attempt, BISAC codes are good for that.

3. BISAC codes are, however, no good if you want to granularly classify sub-book-level content. THAT MOMENT IS COMING. And it’s been here in the textbook market for a while.

4. Having SQL access to Bowker data is…


A Thing I Should Never Have To Say Out Loud

“No cloning in the house!”

Today’s #ISBNhour Conversation

can be found here.

Just Because You Send It Doesn’t Mean They’ll Use It

Otherwise known as “you can lead a horse to water….”

At today’s BISG Identification Committee meeting, I started thinking YET AGAIN about the amount of data flying around in our increasingly digitized industry. Our databases – with their skillions of titles, and skillions more being published every year – frighten music and video industry folks. (There are, in the Bowker database, about 150,000 publishers with fewer than 10 ISBNs each.) This is a lot of volume.

So it’s not surprising that it doesn’t all get used. It’s important to remember, when you’re packaging up your metadata to go out to trading partners, that they don’t use every single field. Once again, good relationships can help with this. The more those fields get discussed between trading partners, the greater the understanding is about what’s relevant and what’s not.

Publishers send Bowker proprietary data all the time – identification tags, descriptors. And we don’t ingest it all. There is no one-size-fits-all metadata feed. (There is a one-size-fits-a-lot feed, however.)

Yet another reason for trading partners to talk amongst themselves.


I Thought This Would Do the Work For Me!

Last night Bernardo and I were rewatching The Sopranos – well, we just finished rewatching The Wire, so where else is there to go – and we wrapped up “The Legend of Tennessee Moltisanti”. This is in Season 1, where Chris is writing his screenplay. He got a screenplay formatting program, and he’s gesturing wildly to Paulie and saying, “I thought this would do the work for me!”

He’s barely literate. Nothing’s going to do the work for him but…him. As we know, he eventually completes it (and it’s horrible).

We all have to do the work to get the result. Installing a new database, configuring a new system, changing workflow – we still nevertheless have to buckle down and do the work.

(I’m speaking as someone who’s procrastinating writing a PRD in any number of imaginative ways, including writing a very elegant SQL query that will take approximately three lifetimes to generate results, but I neeeeeeeeed that data to build the requirements.)

Information Proliferation

At 11:36, my Tweetstream erupted with the news that Gorbachev had died.

By 11:44, there were seeds of doubt.

By 11:51, the “news” had been thoroughly debunked.

It’s illustrative not only of how fast data travels, but how fast it gets corrected. I think about this with regard to book metadata distribution. (And stay tuned for Brian O’Leary’s report on this – I’ve had an advance look and he nails it. Of course.) Yes, erroneous data gets out all the time. Where that erroneous data originates is hard to say – there are thousands of data feeds flying about, and many publishing companies don’t even know how many data feeds they currently send, much less who ingests them and what parts of those feeds get ingested.

But correcting that data is not rocket science. If it is…

Because, as with all things about metadata, it’s not about the metadata. It’s about relationships with trading partners. If it’s hard to correct erroneous metadata, chances are it’s because you don’t have great relationships with the places that are displaying it. E-commerce sites don’t want bad data. It’s not in their interests to continue to display it. Bad data inhibits sales – for the retailer as well as for the publisher. We are, believe it or not, all in this together.

So if it’s difficult to get your data corrected, perhaps it’s time to get to know the folks at the places where your data’s appearing. They’ll be more inclined to help you. BEA is coming up. It’s the perfect time to stop by booths, get business cards, begin making connections. Data’s a reflection of relationships. There are ways to make it all better.

Playing With Context

One of the joys of standards work is that it allows you to experiment with ideas. This week, I’m looking at some interesting functionality and thinking back to Brian O’Leary’s work on Context First.

Brian’s point is that content itself becomes a commodity fairly rapidly. We see this with public domain books – what’s to distinguish one version of The Awakening from another? Essentially, the context in which it’s published. Are there critical essays? Or is it just the raw, original text? Is there additional material that the reader might be interested in? The publishers that are good at providing meaningful distinctions – which either lead to additional resources or allow the reader to understand how one book is different from another – are the ones that will do best in this increasingly interconnected environment.

And we have at our disposal the mechanisms by which this distinction and interconnectedness happens. ISBNs are the identifiers for books – they distinguish one book from another. DOIs can link an ISBN to other material – chapters, author biographies, webcasts, all the additional content that Brian emphasizes is critical for publishers to prevent their books from being simple commodities. ISNIs are the identifiers for authors – and those can be incorporated into DOIs to link out to other authors (linking Jack Kerouac to Allen Ginsberg, for example), to author websites, to other books by that author. These tools facilitate that contextual connectivity.

It’s amazing to actually be able to create the platform for this context. Watch this space – I’ll be providing examples of these things in the wild.

Post Navigation


Get every new post delivered to your Inbox.

Join 3,528 other followers