In 1999, a group of publishers, online booksellers, distributors, and data/content aggregators gathered around a table at the New York office of the Association of American Publishers. (It would be the first time representatives from both Amazon and Barnes & Noble were in the same room together, and the largest gathering to date of all participants in the book supply chain.) In discussions that occasionally got heated, the group acknowledged the challenges in getting book data from publisher to retailer (through a variety of channels) so that consumers could view it.

At the time, EDItEUR and AAP were developing something that they called Online Information Exchange – that meeting was the US book industry’s first exposure to what would become ONIX. Within a few months, the Book Industry Study Group had put together a committee (with liaisons from EDItEUR and AAP) to examine ONIX and determine the viability of its implementation in the US. This became the BISG Metadata Committee, which is now chaired by Richard Stark at Barnes & Noble (an original Muzer). ONIX is a global standard and the BISG Metadata Committee is the venue for American publishers to review the standard, recommend improvements, and troubleshoot implementation.

So what is ONIX, exactly? It’s an XML schema for communication information about products in the book supply chain. There are several series of standardized tags, as well as codelists denoting controlled vocabularies. Much of the work of the BISG Metadata Committee centers around the codelists – defining formats, contributor roles, determining what the term Page Count actually means. Entire 3-hour meetings have been dedicated to defining what a Pub Date is. (This has never actually been resolved to anyone’s satisfaction.) Where there are standards, there are compromises, arguments, and rat holes.┬áThirteen years after the first meeting about ONIX, the discussions can still sometimes get quite heated.

In the process, the term ONIX has come to be nearly synonymous with “book metadata”. Many, many publishers never view the DTD or create XML files – the metadata gets entered in spreadsheets, or by hand in online data-entry forms. In 2005, BISG began developing a set of best practices for publishers who were sending metadata, regardless of format.

ONIX has undergone several revisions. Most of the US industry is still on Version 2.1 (European publishers are moving to Version 3.0, which handles ebook metadata and other issues with more flexibility).

A Bit Of History: Building Babel

In 1995, I was working for a weird little company called Muze. Originally located in Williamsburg, Brooklyn, well before Williamsburg became synonymous with “hipster”, Muze was founded in 1990 by Trev Huxley, grandson of Aldous, and Paul Zullo, producer of the King Biscuit Flower Hour. It was originally a database of music that had been converted to CD; later, they created a video database as well. By the time I got there, they had just licensed Bowker’s Books in Print database and were amplifying that with synopses, reviews, and all manner of links and tags. The goal of all this was to install the data in kiosks in stores, so customers could easily browse for the products they wanted.

The team that created all this content spent two days a week at the New York Public Library, researching connections between authors and books. We created Schools and Movements, Themes and Genres. We created sprawling taxonomies of time periods and locations. We mapped Bowker subject headings to BISAC categories, created a central Canon of authors whose works we would prioritize. We transcribed the endorsements on the back jackets of books. We entered flap copy into the database, wrote our own annotations when moved to do so. We sent stacks and stacks of faxes to Bowker, correcting their data.

And suddenly there was Amazon.

It was sudden, almost overnight. The book world was upended. Amazon had licensed data from Baker & Taylor, not us (nor Bowker), and while the Muze data was more intricate, the Amazon data was more visible. Many of us, over the next three years, went to work for Barnes & Noble.com to help B&N attempt to duplicate Amazon’s success – I can’t speak for the others on the team, but in my case it was a matter of finding work that was secure. Barnes & Noble would never go out of business, and if this whole World Wide Web thing didn’t pan out, I could at least work in the back office of the bookstore chain.

By 1998, it was becoming apparent that Amazon had caught the book industry – and, to an extent, even itself – flat-footed in one regard: information about books. Consumers could see it. Authors (and their mothers) complained. Publishers complained. Agents complained. The titles were truncated. The prices were wrong. The annotations – such as they were – were either far too brief to tell what the book was about, or filled with HTML-unfriendly characters. Data at online stores got updated erratically – depending on how many people were working that day, who was chosen for Oprah’s Book Club, what other promotions were being held. Publishers clearly had never expected anyone outside the book industry to look into their databases…and it showed. Rather than flipping through a beautifully-illustrated, curated Spring or Fall catalog, consumers were confronted with what sometimes looked like gibberish.

It became clear that internet bookstores were not going away, and the industry needed a standard for the information.


