Book publishing. And everything else.

Archive for the tag “ONIX”

It’s Not Just ISBNs Anymore: DOI

Oh, yeah. Here we go. Now we’re cooking with ambiguity. The DOI is a slippery thing to grasp, and the user manual is a skillion pages long.

The DOI is the digital object identifier. Yes, it identifies digital objects. It also digitally identifies objects that may not themselves be digital but may have a digital presence (like an author who has a website, or a printed book that has a database-driven product page on Amazon or B&N.com).

It is a dumb number. There is no reliable, lasting meaning in the digits. It’s a prefix and a suffix, with a slash in between them.

Like this: 10.1000/123456

All DOIs start with the number 10. (For now.) The number after the 10 refers to the original agency through which the DOI was assigned (which could go out of business or get bought or sold). The suffix is the identification of that particular object – a book, a journal article, a website – and the suffix can be any length and contain letters, numbers…even other identifiers.

So…what’s it for?


No, really…anything you want, so long as it’s in a networked environment.

  • A book – doi:10.2345/ISBN 978123456789 (see what I did there?)
  • An article – doi:10.2233/66r97q
  • An entire journal – doi:10.6622/ISSN 6767-9012
  • Author website – doi:10.0033/ISNI 1233 4566 7899 1112

Okay – so what does the DOI actually do?


“Resolves” is the network scientist’s way of saying “it goes somewhere”. The DOI helps you find things. It has two qualities that make sure that you will always be able to find something that has a DOI identifying it.

  • Persistence – URLs change. DOIs don’t. If the author website uses a DOI, it can get moved from one platform to another…but people will always be able to find it.
  • Multiple resolution – Sometimes a thing (a chapter) resides in more than one place on the web. A single DOI can send a person to the multiple places where that thing lives. OR…sometimes a thing (a book) has more than one component (a chapter, an author biography, the book itself). A single DOI can direct a person to each of these components.

Which is pretty huge. ISBNs don’t go anywhere. ISSNs don’t go anywhere. ISNIs don’t go anywhere. The DOI is a kind of identifier that makes other identifiers…actionable. You can do things with them.

So what’s the magic?

Like the best magic, it operates on basic principles: in this case that of identifiers – they identify. The metadata describes.

  • The identifier tells the DOI system that a thing exists, and is unambiguously that unique thing and not any other thing.
  • The metadata tells the DOI system what that thing is, where it lives, and how to get to it.
  • Even if the metadata changes, the DOI remains the same. (Think of the price of an ebook. The price goes up, the ISBN is still the same. If the ISBN is embedded in the DOI, the DOI remains the same as well.)

So who’s using this thing? Mostly, right now, journals publishers – you can go here for a great application of DOIs in journals. But also the military! Libraries. Science, technical and medical publishers. And – according to our database – other publishers who are working with “chunks”, sub-book content that needs to be identified on a granular level.

Using a DOI means that you can resolve an ISBN simultaneously to a purchase page, the author website, and an excerpt. It means you can resolve an ISBN to sub-book components (chapters, charts, sections) which are sold separately. Or  you can resolve an ISBN to locations of additional material – enhanced content, supplements, lab manuals, workbooks, card decks, calendars….

Basically, the DOI helps a publisher upsell – additional stuff like related titles, t-shirts, games and toys, posters, audio or video files. If you can identify a thing, and you have the rights to that thing, the DOI can help you organize all the data so you can sell that thing and a bunch of other things besides.

I’m angling to get some pilots going so I can actually point to different websites and people can see it in action. It’s a cool little thing, even if it is almost inexplicable. If you want a deep dive, you can go here.

It’s Not Just ISBNs Anymore: ISNI

The ISNI is a newly-ratified standard – ISNI stands for International Standard Name Identifier. It’s 16 digits – 15 numbers and a check digit (which could be an X).

It looks something like this:

ISNI 1244 5677 8198 0239

Here’s what it’s for: Names.

Yeah, seriously. It’s for assigning to names of people – specifically, public identities. So an author, a singer, a company name (as companies are public identities too), or fictional characters. The ISNI identifies Madonna (not Madonna Louise Ciccone), Random House, or Sherlock Holmes.

At this point, you are probably shaking your head and muttering, “Why????” And, of course, there is an answer!

Sometimes two authors have the same name. Thomas Wolfe, who wrote “You Can’t Go Home Again”, is a different person from Tom Wolfe, who wrote “Bonfire of the Vanities”. Or one author has multiple ways of spelling his or her public identity – Fyodor Dostoevsky is the same person as Fedor Dostoievski.

The identification of these names – distinguishing them or assuring us that they are indeed the same person – helps a lot when you have so many books from so many countries flooding the market.

The obvious next question is, what about pseudonyms or aliases?

Here’s where you’ll be annoyed – they get separate ISNIs. That’s right, Ruth Rendell and Barbara Vine are separate public identities. As are Stephen King and Richard Bachmann. Or David Johansen and Buster Poindexter!

Once again, the identifiers don’t establish the relationship among these names. They just identify the fact that there are different names. The metadata for each identifier refers to the other name (and ISNI for that name if it’s a public identity) and describes the relationship between them.

So how does this all help get more books into the hands of more people? Basically, in search results. When website databases use ISNIs, they can cleanly distinguish the books of authors with the same name who are truly different people (and not have to rely on middle initials or other unreliable text differentiators). They can show customers all the books of a particular author whose name gets spelled in different ways (really important for authors whose names are not in the Latin alphabet!). And it keeps the books of authors with pseudonyms distinct and separate – maybe Ruth Rendell never intended any of us to know that she was also Barbara Vine.

All of this means that people find the exact books they are looking for. It keeps readers – book-seekers – happy.

More info about ISNIs is here.

It’s Not Just ISBNs Anymore: ISTC

As more content gets digitized – as more books get digitized and made searchable, and book metadata refers to many, many kinds of products – it’s necessary to use other identifiers to help organize all that information and make it find-able. The ISBN only goes so far – it unambiguously identifies an edition of a book. But in an increasingly networked book world, we need to identify many other things as well. What if we want to talk about, search, and work with all the ISBNs of a book at once? What if we want to search for movie scripts, poems, and other things that will never have ISBNs?

This is why the ISTC was invented.

The ISTC  identifies “textual works” regardless of how they are published. Some examples:

  • Prose (books, articles)
  • Lyrics (words only)
  • Poetry
  • Screenplays
  • Audio scripts (radio, podcast)
  • Stage scripts
  • Other scripts (sermons, speeches, presentations, lectures)

It looks like this: 0A32009012445C9B

So how is this different from ISBNs? Well, ISBNs identify specific editions -trade  paperback, mass market paperback, hardcover, large print, ePub, PDF. ISTCs identify the text itself, regardless of format.

A useful picture can be found here:  How ISTC Works

And here is a list of things the ISTC is not for:

  • Abridged Editions
  • Annotated Editions
  • Compilations
  • Critical Editions
  • Excerpts
  • Expurgated/Edited Editions
  • Non-text material added (enhanced ebooks)
  • Revised editions
  • Translations

These are called derived works, and they each get their own ISTC. Why, you ask?

Well, the ISTC is not a work identifier. It’s a text identifier. The manifestations (editions) must each have the identical text string to get an ISTC. Thus, translations, abridgements, etc. have separate ISTCs from the original text.

So how does it work in real life? Let’s use “New Moon” by Stephenie Meyer as an example. The movie script gets a separate ISTC from the novel – because they contain different words, different texts. (The hardcover, paperback, and ebook editions all have the same ISTC, because they are identical in text.) And the Spanish-language version, “Luna Nueva”, gets a separate ISTC, because its text, its words, are different.

But what if I want to relate them? you ask. What if I want to let people know that these text strings all go together in some way?
This, my friends, is why we have metadata. The identifiers identify – they say “this thing is not that thing” or “this thing is the same as that thing” – and the metadata describes. Describes the thing, describes its relationship to other things.
The metadata for the ISTC allows you to specify a “Source ISTC”. So in this way, you can link the original ISTC to any or all of its derivations. All derivations of “New Moon” can be related by sticking the original book’s ISTC into the “Source ISTC” field.
Bowker’s Books in Print database, for example, stores both:
What can be done with it? Ideally, it makes search results less ambiguous. You can be sure what is and isn’t a translation, an abridgment, a related work. As our books proliferate (and we added 1,555,790 ISBNs to the Books in Print database in 2011), honing search results becomes absolutely essential to make sure that readers find the exact right book.
If this intrigues you and you want to geek out on the standard, you can go here and go bananas.

Of Deprecated Tags

Someone at metadata subcommittee meeting years back, asking about ONIX: “Why was this tag deprecated?”

Response: “Because it was simple and easy to use.”

ONIX for Publishers

Message from an ONIX recipient:

Dear Publishers,

ONIX is not your internal database.  No, really, it’s not.  Just because you have a field in your database for your own private use doesn’t mean anyone outside your company gives a shit about it.  No, really.  Seriously.  We. Don’t. Care.

Or to paraphrase [redacted in the Metadata Committee meeting]: “I’m not going to object to the addition of this tag in ONIX, but I have no idea why publishers think it needs to be added.  We have no intention of ever using this information and we don’t know of anyone who does, but if you want it in there to make you happy, then fine.  Just don’t expect to see the information displayed anywhere, ever.”

This was met with resounding silence, which I assume means all the publishers were like, “Great, then we’ll get it added, thanks!”

Yes, let’s talk about audiences for metadata, shall we?

ONIX for…Wait, What Is A Book Anyway?

On Apr 26, 2012, at 11:42 AM, [redacted] wrote: 




Oh wait, it’s called “ONIX for Books”.  Gosh, I would never have guessed that from reading the product record documentation.


From: Laura Dawson

Sent: Thursday, April 26, 2012 11:44 AM

To: [redacted]

Subject: Re: onix for universities…

Well, there is that whole thing about what is a book anyway….


From: [redacted]

Subject: RE: onix for universities…

Date: April 26, 2012 11:44:27 AM EDT

To: LAURA DAWSON <ljndawson@gmail.com>

*stares at you with slowly narrowing eyes*



Post Navigation


Get every new post delivered to your Inbox.

Join 3,528 other followers