LJNDawson

Book publishing. And everything else.

Archive for the tag “Identifiers”

A thought on identifiers and books

In mid-March of 2006, NISO convened a roundtable of experts and thought leaders in digital resources, at the National Library of Medicine in Bethesda, Maryland. The goal of this meeting was to establish some consensus around the use of identifiers for text, video, music, and other media in the digital realm. In breakout discussions, three characteristics of an identifier were ultimately defined: granularity, semantic opacity, and persistence.

The granularity of an identifier refers to precisely what it identifies. An ISBN, for example, identifies a stand-alone, trade-able publication (a book or a chapter). It does not identify an illustration, a diagram, a bibliography. The publication is the extent of the ISBN’s granularity. Other identifiers (such as the DOI) can identify components of publications.

Semantic opacity refers to the degree to which the identifier is a “dumb number” – a random string of numbers that carries no intelligence. The ISBN is only partly a dumb number – it begins with 978 or 979, which indicate that the thing being identified is in the book supply chain; it then has a publisher prefix. The string following the publisher prefix is semantically opaque, and the ISBN ends in a check-digit that validates the number.

Persistence refers to how long the relationship between the identifier and the object will last. Identifiers on shipping containers, for example, do not need to be persistent after the container has been unloaded and its contents dispersed. Identifiers on books need to be persistent for a much longer period of time, as information about a book can be created long after the book itself has gone out of distribution.

Essentially, all an identifier does is say, “This thing is not that thing.” It doesn’t say what the thing is, or offer any insight about any of the thing’s characteristics. An identifier expresses uniqueness. And that’s all it expresses.

Identifiers in everyday life

I talk a lot about identifiers. It’s my job. The esoteric identifiers – DOIs, ISNIs, ISTCs. The pragmatic ones – ISBNs. The other day I found myself in a meeting referring to URIs while the developers were talking about URLs (this is how you know you are either a geek or a purist jerk, or both – yeah, for 15 minutes I was “that guy”).

But outside of work, there are plenty of identifiers in our everyday lives – with varying degrees of “smartness” and “dumbness”. We’re quite comfortable with these, because we’ve grown up with them, and have to use them all the time, but when it comes to Big Data, they’re no different than any of the other numbers we talk about.

Social Security numbers are a good start. The first three numbers indicate the state where the SSN was assigned. The next two numbers are called “group numbers” – they group together the last four digits, which are issued sequentially. However! Some states were running out of numbers. So in 2011, the Social Security Administration began randomizing the assignment of numbers.

Phone numbers are another example of this. The first three numbers are the area code. The next three are the “exchange” – the local area of the caller. (Long ago, telephone exchanges were actually letters the caller would tell the operator, such as BUtterfield 8.) The last four numbers are randomly generated within the parameters of first the exchange and then the area code. However! Several phenomena have disrupted this system entirely. One is the rise of phone banks – the sheer number of telephone numbers that need to be assigned to these banks meant that new area codes had to be made up. The second is (or, rather, was) the fax machine. Having to assign a separate phone line to fax machines also meant that phone numbers were eaten up. The third, of course, is cell phones. This caused the greatest disruption of all – over time, people wanted to maintain their phone numbers regardless of where they lived. (My phone has an area code of 917, which used to mean Manhattan; it was assigned in 1997 when I lived in Brooklyn and worked in Manhattan – sixteen years later, I have maintained the same number even though I live on Staten Island and work in New Jersey.) Now phone numbers are essentially meaningless.

There are plenty of others – driver’s license numbers, passport numbers, license plates, EZ-Pass numbers, bar codes, numbers on shipping containers, Apple UUIDs. And with the Internet of Things,  there will only be more. As they proliferate, and as our circumstances change, the prefixes of these numbers will have less and less meaning inherent in them. Which is not a bad thing – identifiers are best when they are dumb. All they mean to say, of course, is “this thing is not that thing“.

Searching for Emery Koltay

I was fortunate to be in the UK for the FutureBook 2012 conference, followed by an International DOI Foundation meeting in Oxford. While having dinner with Stella Griffiths, the Executive Director of ISBN International, and Beat Barblan, the Director of Identifier Services for Bowker (and my boss) we talked a little bit about the early days of EDI and commerce-oriented book numbering systems.

Stella brought up Emery Koltay, whom neither Beat nor I had heard of. But apparently he and David Whitaker (presumably one of the sons of “J. Whitaker & Sons,” publisher of British Books in Print as well as Whitaker’s Almanack) developed what became the ISBN. J. Whitaker & Sons eventually merged with several other companies to form BookData, and was ultimately acquired by Nielsen. Emery Koltay…worked at Bowker and eventually headed up the ISBN Agency in the US.

Which – well, apparently Emery Koltay had had enough adventures in his lifetime so that settling down to a career of what amounts to arithmancy and ancient runes was a welcome relief. This is his obituary (originally sent by Stella, and which I later found online):

Emery I. Koltay of Eastchester, NY passed away on August 23, 2012, after a long illness. He was born in the Transylvania region of Romania December 22, 1921. During WWII he escaped from several Hungarian work camps and survived the war in Budapest hiding under an assumed name. After the war he returned to Romania where he completed his education and started a family. In 1958 he was arrested by the Securitate, secret police, and spent four years in a communist prison for aiding the escape of Jews from the regime. In 1963 he emigrated with the family to the U.S. where he established himself as editor and publisher of reference books. He also took a lead in working with the library of Congress, developing and implementing the international book numbering system for U.S. publishing. Klara, his wife of sixty one years, died in 2007. He is survived by two children, four grandchildren and three great grandchildren.

Yeah, my jaw dropped too. Six years after getting out of work camps, hiding, oppression, and communist prison, he introduced ISBNs into the US book supply chain.

Apparently he continued working at Bowker (even after “retirement”) until 1996. I wish I had known him.

Today’s #ISBNhour transcript

Can be found here, thanks to Porter Anderson!

 

On the Open Web…

…anyone can be a public figure.

We have a potential for 7 billion public figures.

Some Logic Around Search, Identifiers, and Discovery

I’ve always had a gut feeling that identifiers help in adding authority to search results. If a website about a book has an ISBN on it, chances are the ISBN is referring to the actual book. This is true of dynamically-generated product pages at online bookstores – I know from my experience at B&N. And it seems self-evident.

The mystery is that Google is a “black box” – what goes in is not always what comes out, and of course much of their algorithm is proprietary. But there are some things we know, some premises we can make, based on what Google has said themselves, paired with common sense:

  • Google prioritizes data it finds valuable
  • Valuable data is both unique and authoritative
  • Therefore data that’s associated with identifiers (which are by design unique and authoritative) has a better shot at being prioritized higher…than data that doesn’t.

This is not the stuff that’s in the <meta> tags. That stuff has been polluted by spammers and Google barely even looks at it.

It’s…microdata. Well, of a sort – microdata is actually a markup format that describes elements on a website. Elements such as

  • Reviews
  • People
  • Products
  • Recipes
  • Businesses
  • Events
  • Music

Kind of a random assortment – but those are the elements that Google is picking up in its effort to create “rich snippets” – those new data elements you see to the right of the search page when you’re looking for something.

Other formats, besides microdata, that help Google accomplish the same thing are RDFa and microformats. But Google prefers microdata, and so that’s what I’m focusing on.

These formats describe the tags that Google will pick up. But what goes IN the tags?

Ontology.

It’s a very vague-sounding word, but for our purposes, an ontology is just a vocabulary that everybody agrees on. Google is using the ontology created by Schema.org. Therefore, if your website uses the same ontology, chances are that “rich snippets” will result.

The important thing from my business’s perspective is that the ontologies include identifiers. Some identifiers that Google picks up are ISBN, UPC and ASIN. Web pages that are about books, with ISBNs in the <title> field, will be viewed by Google as reliable and therefore will be weighted more heavily in search.

When I say “picks up”, I am not referring to page rank. The algorithm for page rank is constantly shifting. But these identifiers do contribute to the formation of “rich snippets”, which in turn call more attention to the web page. Over time, the more attention the page gets, the more hits and links it will get. And those things determine page ranking.

So it’s not a direct relationship between identifiers and search – it’s an organic one. But without identifiers – without that assertion of uniqueness and authority – we’re not even on the road to that organic relationship. We’re condemned to the chaos of keywords.

Many, many thanks to Gary Price, whose conversation with me about these issues clarified my thinking immensely.

Post Navigation

Follow

Get every new post delivered to your Inbox.

Join 3,507 other followers