• A Useful Amplification of Records That Are Unavoidably Needed Anyway

    November 19, 2008

    Le Mundaneum à Mons (Belgique) © dalbera / CC-BY-NC

    Le Mun­da­neum à Mons (Bel­gique) © dal­bera / CC-BY-NC

    Depend­ing on books can feel like rely­ing on snail mail. “Now that I’ve showed you how to find some arti­cles,” I say to peo­ple at the ref­er­ence desk, “I’ll show you how to use our web­site to find some books you might want to check out. And after that, wouldn’t it make your grandmother’s day if you wrote her a letter?”

    For any­one accus­tomed to the Inter­net, books can lack the imme­di­acy of arti­cles or web­sites. Books gen­er­ally have slower devel­op­ing nar­ra­tives, and often have longer para­graphs, sen­tences, and words, which means they don’t lend them­selves to skim­ming. Com­pared to dig­i­tal mate­r­ial, rel­e­vant pas­sages can be hard to find, and even find­ing the right book can be challenging.

    Although library web­sites are improv­ing, key­word search­ing doesn’t work well at most libraries and faceted brows­ing—the links down the left side of the page on Ama­zon—is still a rar­ity. More impor­tantly, with one notable excep­tion, there is a good chance that noth­ing on the shelf that is “printed on paper and con­structed on the model of the codex” includes the exact infor­ma­tion you have in mind.

    This is where uni­ver­sal cat­a­logs come into play. If there’s noth­ing on the shelf that meets your needs, the next step is to fig­ure out if such a book exists. There are five web­sites that pro­vide rel­a­tively com­plete and eas­ily acces­si­ble lists of books: Ama­zon, Google, Library­Thing, World­Cat, and Open Library. In order to make the best use of these web­sites, it can be use­ful to learn how each of them started, what keeps them going, and how their busi­ness mod­els and prac­tices affect the data they col­lect and and how they go about shar­ing it.

    Ama­zon

    It’s tempt­ing to think of Ama­zon as a tech­nol­ogy com­pany. That’s how Werner Vogels sees it, which is under­stand­able: he’s their Chief Tech­nol­ogy Offi­cer, and he seems to have done a very good job of it, because Amazon’s tech­no­log­i­cal ini­tia­tives have taken a leap for­ward since Ama­zon hired him away from Cor­nell in 2004. Over the last cou­ple of years, Ama­zon has made its mark as a ser­vice sup­plier, rewrit­ing the rules for online host­ing with its Ama­zon Web Ser­vices; it has devel­oped a suc­cess­ful con­sumer elec­tron­ics prod­uct (the demand for its Kin­dle e-book reader con­sis­tently exceeds sup­ply, and it seems to be extra­or­di­nar­ily pop­u­lar with pub­lish­ers as well: they have made almost 200,000 titles avail­able); and it has also made use of its infra­struc­ture with offer­ings as diverse as its Mechan­i­cal Turk and Ful­fill­ment services.

    But if you look at its rev­enue stream, it’s pretty clear that Ama­zon has very lit­tle in com­mon with a tra­di­tional tech­nol­ogy com­pany, such as Microsoft, its Seattle-area neigh­bor. Instead, Ama­zon is prob­a­bly most like a dif­fer­ent neigh­bor: Costco.

    Amazon’s founder, Jef­frey Bezos, seems to have a firm grasp of three impor­tant aspects of retail­ing:

    • Look for items that can be sold in near lim­it­less quan­ti­ties (such as “books, music and videos”);
    • Fig­ure out how to sell them prof­itably but with min­i­mal markup (“He said he would ‘relent­lessly slash prices,’ even if it cut into incre­men­tal prof­its, because he was con­vinced that it was the right thing to do”); and
    • Focus your energy on build­ing cus­tomer loy­alty (“Sat­is­fac­tion sur­veys show that Ama­zon enjoys a golden rep­u­ta­tion among most of its 49 mil­lion active customers”).

    Sim­i­larly, Costco’s founders, James Sine­gal and Jef­frey Brot­man, stock their retail out­lets to the rafters, refuse to mark up items more than 15%, and, in their most recent report to share­hold­ers, they note, “This past year we also enjoyed the high­est mem­ber­ship renewal rate in our his­tory at 87%, attest­ing, we believe, to the high level of sat­is­fac­tion our mem­bers have in our prod­ucts and ser­vices.” Think about the things you typ­i­cally shop for at Ama­zon: are they more like what you buy from Microsoft or are they more like what you buy from Costco?

    Because of Amazon’s size, breadth, and ubiq­uity, it can be easy to for­get that its orig­i­nal busi­ness model was pretty basic: it resold books it bought from Ingram and Baker & Tay­lor. As Tim O’Reilly points out in an apolo­gia on Web 2.0, Ama­zon pur­chased a data­base of book infor­ma­tion from R.R. Bowker, put it on the still new World Wide Web, and encour­aged its cus­tomers to share reviews, bib­li­ogra­phies, and even cor­rect any mis­takes or omis­sions in its data. Two years later, when Ama­zon went pub­lic, it car­ried more than 2.5 mil­lion titles, “includ­ing most of the esti­mated 1.5 mil­lion English-language books believed to be in print, more than one mil­lion out-of-print titles believed likely to be in cir­cu­la­tion and a smaller num­ber of CDs, video­tapes and audio­tapes.” Out-of-print titles were gen­er­ally avail­able within two to six months.

    Amazon’s orig­i­nal for­mula hasn’t changed all that dras­ti­cally. In 2007, books and other media accounted for 62% of its net sales, down from 66% in 2006 and 70% in 2005. The trend may be down­ward, but media sales are actu­ally improv­ing — it’s just that other sales are improv­ing even faster.

    Despite invest­ments in other areas, Ama­zon knows that it is still pri­mar­ily a retailer of books and other media, and it con­tin­ues to invest in com­ple­men­tary ini­tia­tives and busi­nesses that for­tify its abil­ity to sell these items. Its recent acqui­si­tions, includ­ing Audi­ble, Shel­fari, and Abe­Books (which brings with it a 40% stake in Library­Thing), join other Ama­zon busi­nesses, includ­ing the Inter­net Movie Data­base (IMDB), Alexa, and Book­Surge. It also devel­oped its own search sub­sidiary, A9, it was an impor­tant par­tic­i­pant in cre­at­ing ONIX, “the inter­na­tional stan­dard for rep­re­sent­ing and com­mu­ni­cat­ing book indus­try prod­uct infor­ma­tion in elec­tronic form,” and it pub­lished a hugely suc­cess­ful API (now a part of its Asso­ciates pro­gram) through which it makes book jack­ets and sum­maries avail­able to affil­i­ates (includ­ing libraries), and also shares a per­cent­age of sales, inspir­ing cre­ative pro­gram­mers to develop web­sites like Big­Book­Search and Zoomii.

    Ama­zon does all this so it can sell more goods and, in gen­eral, it seems to be work­ing. Con­sumers are get­ting deeper dis­counts on a broader range of books and other media than ever before, and they have an easy time find­ing the items they want thanks to Amazon’s faceted brows­ing inter­face, its active user com­mu­nity, and its search engine which, in many cases, makes it easy to search within the text of pub­lished items.

    While Ama­zon does every­thing it can to pro­vide you with as much infor­ma­tion as pos­si­ble about the items it has in stock, there’s no moti­va­tion for it to share infor­ma­tion about items it can’t sell in vol­ume, such as out-of-print mate­r­ial. If the infor­ma­tion you’re seek­ing is likely to be included in new, com­mer­cially avail­able books, then Ama­zon is an excel­lent resource. If not, you’re best served look­ing elsewhere.

    Google

    Ama­zon is one of two major cor­po­rate alter­na­tives to libraries; Google is the other.

    Ama­zon fol­lowed one of the two tra­di­tional paths for form­ing a giant cor­po­ra­tion: it was founded by an entre­pre­neur who had a good idea for a com­pany and then hired tal­ented peo­ple to build its tech­no­log­i­cal infra­struc­ture. Google fol­lowed the other path: its founders, Larry Page and Sergey Brin, cre­ated some­thing the world wanted and then hired peo­ple to turn their idea into a prof­itable corporation.

    While still grad­u­ate stu­dents at Stan­ford, Page and Brin took Eugene Garfield’s work on cita­tion index­ing and adapted it for the World Wide Web. Garfield, who mar­keted infor­ma­tion prod­ucts through his com­pany, the Insti­tute for Sci­en­tific Infor­ma­tion (now a Thom­son Reuters sub­sidiary), records how often schol­arly papers are cited by sub­se­quent schol­arly papers, which is use­ful because cita­tion fre­quency is a rea­son­able proxy for impor­tance. Sim­i­larly, Google’s PageR­ank algo­rithm is pri­mar­ily a scheme for mea­sur­ing and weight­ing links between Web pages: the more links to a page or web­site, the more likely it is to be impor­tant, espe­cially if those links come from other impor­tant sites. PageR­ank is intended to deter­mine which Web pages are likely to be per­ceived by Google’s users as relevant.

    It was soon appar­ent that Google worked — users found what they were look­ing for — but no one saw any money in it. Page and Brin tried to sell their tech­nol­ogy for $1 mil­lion to the big play­ers in the Web mar­ket. After every­one turned them down, they decided to start their own com­pany, focus­ing their atten­tion on attract­ing as many users as possible.

    Where Ama­zon is a retailer that can be thought of as a vir­tual Costco, Google is an enter­tain­ment com­pany like News Corp or Via­com—it gen­er­ates 99% of its rev­enue from adver­tise­ments. Just as Ama­zon is pri­mar­ily a reseller of prod­ucts oth­ers make, Google is pri­mar­ily a por­tal into con­tent oth­ers cre­ate. Its mis­sion is to “orga­nize the world’s infor­ma­tion and make it uni­ver­sally acces­si­ble and use­ful.” Note the absence of the word “Web” in that mis­sion state­ment: Google’s goal is to orga­nize every bit of infor­ma­tion. For instance, Google cre­ated its free tele­phone direc­tory assis­tance project, GOOG-411 in order to develop speech recog­ni­tion soft­ware. In turn­ing spo­ken words into text, Google opens up the pos­si­bil­ity of search­ing audio and video files through the same Google search box that is cur­rently used to search websites.

    Though the Web has become many people’s pri­mary infor­ma­tion source, a great deal of the world’s infor­ma­tion is still found in books. In order to har­vest that data, in Decem­ber 2004, Google announced that five libraries — the Uni­ver­sity of Michi­gan, Har­vard. Stan­ford, Oxford, and the New York Pub­lic Library — had agreed to let Google begin scan­ning their col­lec­tions (and sev­eral more have since joined the project). Mul­ti­ple ele­ments of this arrange­ment remained secret, includ­ing the terms of these agree­ments and the rate at which books were being scanned. It was also unclear how Google would deal with poten­tial copy­right issues, espe­cially after the Asso­ci­a­tion of Amer­i­can Pub­lish­ers and the Authors Guild almost imme­di­ately filed a joint lawsuit.

    This copy­right law­suit mir­rors another: Viacom’s suit against Google acqui­si­tion YouTube for copy­right infringe­ment. There was some spec­u­la­tion that Google bought YouTube specif­i­cally to make sure YouTube didn’t lose its law­suit, estab­lish­ing a prece­dent that Google would have to over­come if it were ever sued for host­ing video files. When Google reached a set­tle­ment in its book scan­ning law­suit this past Octo­ber, Via­com saw a poten­tial con­ces­sion in its own suit.

    The book-scanning set­tle­ment has raised con­cerns about preser­va­tion and access for Google-scanned mate­ri­als. Har­vard has expressed its reser­va­tions pub­licly, and Peter Brant­ley has been doing an extra­or­di­nar­ily good job of iden­ti­fy­ing and sum­ma­riz­ing the issues involved. How all this will affect peo­ple who want to read books online has yet to be determined.

    What does seem set­tled, at least for now, is that Google has archived an unpar­al­leled num­ber of books (and also schol­arly arti­cles) whose entire text could be as easy to search as the Web. With the suc­cess of Google-411, it seems likely that Google will soon be able to offer text-based search­ing within audio and video files as well.

    What’s not clear is whether adver­tis­ing will make these ven­tures prof­itable or if Google can suc­cess­fully tran­si­tion to alter­na­tive busi­ness mod­els for sub­sets of its data. Right now, it resells access to schol­arly arti­cles and news­pa­per sto­ries for sev­eral pub­lish­ers, and it appears that it will soon be sell­ing access to the books it has dig­i­tally archived. It’s also not clear if Google sees any point in devel­op­ing an active user com­mu­nity around books. While Google allows users to add reviews at its book web­site, user-contributed con­tent is not a focus in the same way it is at Ama­zon or at LibraryThing.

    Library­Thing

    Founder Tim Spalding’s Library­Thing is a new kind of Internet-enabled orga­ni­za­tion, the small com­pany that oper­ates on a large scale. This method for doing busi­ness has been best doc­u­mented by pro­gram­mer, essay­ist, and ven­ture cap­i­tal­ist Paul Gra­ham, one of Spalding’s inspi­ra­tions, though Library­Thing prob­a­bly resem­bles Craigslist more than it resem­bles any of the YCombi­na­tor com­pa­nies Gra­ham has helped to shep­herd into exis­tence.

    Like Craigslist, Library­Thing has an evan­gel­i­cal faith in its users, main­tains a sim­ple and easy to under­stand inter­face, is sat­is­fied with steady and mod­est prof­itabil­ity, and com­petes for atten­tion in a field with sig­nif­i­cantly larger enti­ties (Craigslist is often cited as a cause of the news­pa­per industry’s finan­cial dif­fi­cul­ties, even though it employs fewer than 30 peo­ple).

    Library­Thing gets its data from Ama­zon, from libraries that make their cat­a­logs avail­able through the Z39.50 pro­to­col, and from its users, who sup­ple­ment the data by pro­vid­ing reviews, cat­a­loging infor­ma­tion, adding tags, and dis­am­biguat­ing records. These last two seem to be par­tic­u­larly suc­cess­ful even though they vary from stan­dard library practice.

    The tag­ging con­cept, pop­u­lar­ized by Joshua Shachter’s group book­mark­ing web­site, del​.icio​.us, allows users to cat­a­log items using what­ever key­word they wish. This enables works like Brid­get Jones’s Diary to be tagged “chick­lit” or Neu­ro­mancer to be tagged “cyber­punk,” sub­ject terms that dif­fer greatly from Library of Con­gress des­ig­na­tions for these works by Field­ing and Gib­son.

    Dis­am­bigua­tion allows users to clar­ify records by tak­ing actions such as com­bin­ing entries for works that are iden­ti­cal but released under dif­fer­ent titles, or aggre­gat­ing work under a sin­gle author head­ing even though that per­son has released work under mul­ti­ple names. These can be dif­fi­cult tasks when a small group of staff mem­bers attempt to take this on man­u­ally, and it has proved tricky to teach com­put­ers to dis­am­biguate records pro­gram­mat­i­cally. For instance, author Cyril North­cote Parkinson’s name is sub­ject to mul­ti­ple per­mu­ta­tions (C.N., Cyril N., C. North­cote, etc.), and his most famous work, Parkinson’s Law (which expands on his belief that “work expands so as to fill the time avail­able for its com­ple­tion”), has been released with mul­ti­ple title vari­a­tions and in numer­ous edi­tions. Ama­zon strug­gles to make it clear which edi­tion of Parkinson’s Law a poten­tial cus­tomer might wish to pur­chase and Google offers a few dif­fer­ent options that are not read­ily dis­tin­guish­able from one another. Library­Thing, while rep­re­sent­ing more options than either of the other two, also makes it clear which title its users believe should be con­sid­ered defin­i­tive.

    It’s worth not­ing that Ama­zon, Google, and Library­Thing are not oper­at­ing on a dif­fer­ent scale when it comes to the num­ber of books they’re cat­a­loging. Library­Thing, which launched on August 29, 2005, has cat­a­log entries for over 32 mil­lion books. While open cat­a­loging has its lim­i­ta­tions, LibraryThing’s web­site reg­u­larly demon­strates the power of crowd­sourc­ing big tasks to a large, devoted community.

    That com­mu­nity is the key to LibraryThing’s suc­cess. Just as del​.icio​.us users social­ize around shared book­marks and tags, Library­Thing users social­ize around the books in their col­lec­tions. Users can add 200 books for free, but to add more they have to pay either $10 per year or spend $25 for a life­time membership.

    That’s one way Library­Thing makes money. The other is Library­Thing for Libraries, a ser­vice that allows libraries to inte­grate LibraryThing’s tag data­base and, as of Sep­tem­ber 2008, its user reviews, into par­tic­i­pat­ing libraries’ web­sites. This ser­vice is offered on a slid­ing scale, with the small­est libraries pay­ing $1,000 per year.

    While Amazon’s busi­ness model does not tar­get libraries in any dis­cernible way (either as cus­tomers or com­peti­tors), and Google appears to be inter­ested only in the largest libraries as part­ners, Library­Thing seems to be actively inter­ested in sell­ing its ser­vices to pretty much every kind of library — dozens have already signed up for Library­Thing for Libraries—and in digest­ing Z39.50 feeds (or get­ting records in other for­mats) from any library will­ing to share. In a pinch, it appears that Library­Thing will even take care of your cat­a­loging.

    World­Cat

    OCLC is a non­profit con­sor­tium that includes almost 70,000 libraries as mem­bers. It was founded in 1967 as the Ohio Col­lege Library Con­sor­tium. In 1977, it began allow­ing libraries out­side Ohio to become mem­bers, and in 1981 it changed its name to the Online Com­puter Library Cen­ter. It has made mul­ti­ple acqui­si­tions as it has grown, includ­ing the Dewey Dec­i­mal Clas­si­fi­ca­tion Sys­tem and its only com­peti­tor, the Research Libraries Group, which oper­ated from 1974 until 2006. This sort of activ­ity, and OCLC’s busi­ness model, led to its non­profit sta­tus being inves­ti­gated, but ulti­mately rec­og­nized. Under­stand­ably, OCLC uses its tax sta­tus to its advan­tage, just as some non­profit hos­pi­tals take advan­tage of their sta­tus and IKEA makes use of its unusual struc­ture.

    OCLC’s most widely vis­i­ble prod­uct is an amaz­ingly good web­site, World​Cat​.org, which pro­vides free access to over 110 mil­lion library cat­a­log records, most of which are for books: mem­ber libraries pro­vide access to their entire col­lec­tion, which includes arti­cles, audio, and video. Right now, World​Cat​.org is the best free web­site that lets vis­i­tors use key­words to con­duct seri­ous research across all media types, a fea­ture which all on its own would make it valu­able. On top of that, OCLC has inte­grated its work on FRBR and xISBN — projects that make it eas­ier to find what you’re look­ing for — help­ing to turn World​Cat​.org into an invalu­able resource.

    One of the two major prob­lems with World​Cat​.org is what it doesn’t include: the long tail of library records. With 70,000 libraries con­tribut­ing records, it’s tempt­ing to assume that just about every book is included in the World​Cat​.org data­base, but that’s prob­a­bly far from true. OCLC’s Karen Cal­houn has writ­ten about its efforts to posi­tion its pric­ing and ser­vices so smaller libraries can par­tic­i­pate, and OCLC is mak­ing inroads, but it still serves far fewer than half of the smaller libraries in the United States. This won’t affect most of the pop­u­lar mate­r­ial — big libraries have just about every major work held by a smaller library, so the small libraries’ records are redun­dant in these instances — but it does mean that more obscure works col­lected by smaller libraries, rep­re­sent­ing local authors and regional his­tor­i­cal resources, may not be included.

    This sort of lim­i­ta­tion affects every­one from ama­teur geneal­o­gists to aca­d­e­mic researchers. For instance, I have a friend who is writ­ing her doc­toral the­sis on the his­tory of ill­ness in the coun­ties sur­round­ing Philadel­phia. Almost none of the libraries, archives, and his­tor­i­cal soci­eties she is rely­ing on have shared their cat­a­logs with OCLC. This means she must make use of each of these col­lec­tions indi­vid­u­ally, usu­ally in per­son, and spend time learn­ing how each col­lec­tion is orga­nized. This is the research equiv­a­lent of using a man­ual type­writer instead of a Mac­Book Pro to type her dis­ser­ta­tion, and rep­re­sents a fail­ure to make the best pos­si­ble use of avail­able tech­nol­ogy. These col­lec­tions’ records should be included in World​Cat​.org.

    This kind of wasted oppor­tu­nity to assist researchers is one major dis­ad­van­tage of WorldCat.org’s omis­sion of smaller libraries’ hold­ings. The other major prob­lem arises when researchers try to make use of one WorldCat.org’s sig­na­ture fea­tures. When users search for an item in World​Cat​.org, they can select a tab labeled “Libraries,” which takes them to a list of local libraries that have that item in their col­lec­tion. How­ever, only libraries that share their records with OCLC are listed. For exam­ple, search for Dae­mon: a novel by Leinad Zer­aus and select the Libraries tab. World​Cat​.org dis­plays ten libraries where you can find this book, in descend­ing order of prox­im­ity. It would be nat­ural for World​Cat​.org vis­i­tors to infer that these are the ten clos­est libraries that have this book. Unfor­tu­nately, that’s prob­a­bly not the case. Instead, World​Cat​.org is dis­play­ing the ten clos­est libraries that share their records with World­Cat. Users who believe that World​Cat​.org is help­ing them search their nearby libraries may be led to believe that their local libraries don’t have any books at all — or, at least, none of the books they’re hop­ing to find.

    Of course, it’s pos­si­ble that some libraries may not want their records included in World​Cat​.org. I’m not sure why they would feel that way, aside from the recent hul­la­baloo over licens­ing which appears to be get­ting increas­ingly heated. How­ever, the library where I work very much wants its records in World​Cat​.org so that our neigh­bors in town can use it as an alter­na­tive way of look­ing for the books that are avail­able in their local library.

    OCLC mar­kets World­Cat and other ser­vices through a net­work of regional ser­vice providers. The provider for our area is PALINET, so if we want to get our records into World­Cat, we have to go through PALINET. Unfor­tu­nately, between OCLC and PALINET, a sort of “if you have to ask, you can’t afford it” pric­ing struc­ture seems to have emerged for get­ting records included in World​Cat​.org.

    I don’t think this is anyone’s fault. Every­one I’ve met at OCLC and PALINET is smart, ded­i­cated, and help­ful. My guess is that it’s more like Kate Sheehan’s post office story in which her attempt to pick up a pack­age left her feel­ing “bro­ken or inept.” That’s cer­tainly how I felt after spend­ing a month exchang­ing emails with PALINET. At the end I was so con­fused that it just didn’t seem worth both­er­ing to get an accu­rate price to take to my board, because the one thing about which I was rel­a­tively cer­tain was that we didn’t have enough money to share our records on the World​Cat​.org website.

    The folks at OCLC seem to be work­ing hard to rem­edy this sit­u­a­tion. I have faith that they’ll get there. But until they do, there will prob­a­bly be a lot of libraries that would like to share their records in World​Cat​.org and either can’t afford it or can’t fig­ure out if they can. That means researchers are going to have to keep work­ing harder than nec­es­sary, World​Cat​.org users will keep being mis­led by its Libraries tab, and frus­trated libraries may find them­selves look­ing for more accom­mo­dat­ing partners.

    Open Library

    Along with OCLC’s World​Cat​.org, Open Library is one of two major non­profit ini­tia­tives cen­tered on cre­at­ing a uni­ver­sal book cat­a­log: its goal is a page for every book ever pub­lished, and to enable those pages to be updated by users, just as Library­Thing or Wikipedia pages are edited by site vis­i­tors. Since its found­ing in July, 2007, it has added over 30 mil­lion records to its book database.

    For now, Open Library may be best known for its founder, Brew­ster Kahle, and its tech­ni­cal lead, Aaron Swartz. Both are Inter­net celebri­ties and ser­ial entre­pre­neurs, though both spe­cial­ize in non­profit star­tups. Kahle has sold com­pa­nies to AOL and Ama­zon, but he is best known for his work on the Inter­net Archive, home of the Way­back Machine, which attempts to archive the entire Web. Swartz was a founder of Red­dit, which was sold to Condé Nast, and a devel­oper of RSS, which enables web­sites, most notably blogs, to deliver con­tent directly to read­ers. Open Library is cur­rently funded by the Inter­net Archive and the Cal­i­for­nia State Library and is com­mit­ted to remain­ing entirely free, right down to the code that runs the site, which it makes avail­able through an open source license.

    Unlike our expe­ri­ence with OCLC, shar­ing our records in Open Library was dead sim­ple: I emailed Aaron Swartz and he replied that receiv­ing our records “was cause for much rejoic­ing.” (I also emailed Tim Spald­ing at Library­Thing to see if he might be inter­ested in our records, and I found out he was as well.) Open Library is actively solic­it­ing these con­tri­bu­tions from libraries. How­ever, it could, poten­tially, get these records directly from library web­sites. The tech­nol­ogy involved is pretty sim­ple and fairly well understood.

    For exam­ple, the library where I work recently intro­duced a new web­site that’s pow­ered by Casey Bisson’s fan­tas­tic Scrib­lio project. To import the Collingswood Library’s old records into our new web­site, we had Scrib­lio visit the web page for each record in the old cat­a­log and import its data into the Scrib­lio data­base, turn­ing blah into beau­ti­ful. We also use scrib_availability to show web­site vis­i­tors if the book is on the shelf.

    Open Library clearly has the tech­ni­cal knowl­edge to do some­thing like this and, because just about every library has a web-based cat­a­log, it could eas­ily include every book from pretty much every library in its data­base, enabling site vis­i­tors to learn if their local library has the book they want. For now, Open Library’s book pages, LibraryThing’s book records, and Google’s About this book pages link to World​Cat​.org. (Edit: I orig­i­nally wrote that Google’s About this book pages did not link to World​Cat​.org. In the future, I’ll try to remem­ber to dis­able my Fire­fox exten­sions before mak­ing such claims.)

    The issue isn’t tech­ni­cal; it’s legal and eth­i­cal. On behalf of the library where I work, I uploaded our records to archive​.org, mak­ing it pos­si­ble for Open Library to use them, and on behalf of my library I uploaded them into our Scriblio-based web­site. It seems unlikely that libraries will have their records aggre­gated with­out their per­mis­sion, at least in the near future. How­ever, it wouldn’t be sur­pris­ing if Kahle or Swartz, instead of ask­ing for our records, began ask­ing for our per­mis­sion: what if they came to us and asked if they could auto­mat­i­cally index our cat­a­logs, cre­at­ing for free a ser­vice that costs libraries thou­sands of dol­lars through OCLC? Even non-OCLC libraries are used to shar­ing their records. Why wouldn’t they accept Open Library’s offer to cre­ate a uni­ver­sal cat­a­log? For most libraries, there’s no down­side, but there’s an enor­mous upside: a sin­gle web­site where the world could see their records, and a free hub they could use for shar­ing records with each other.

    A Use­ful Amplification

    In his 1992 Redesign­ing Library Ser­vices: A Man­i­festo, Michael Buck­land writes that, “(f)rom an oper­a­tional per­spec­tive the library cat­a­log can be seen as a use­ful ampli­fi­ca­tion of records that are unavoid­ably needed any­way. The infor­ma­tion in a cat­a­log can be use­ful in a vari­ety of ways to library staff and library users. The dif­fer­ence between mod­ern library cat­a­logs and those before the late nine­teenth cen­tury is essen­tially that the mod­ern cat­a­logs have a much larger bib­li­o­graph­i­cal super­struc­ture added to the loca­tional infor­ma­tion than had pre­vi­ously been the case.” In a nut­shell, Buck­land is say­ing, libraries decided that, since they had to keep a list of what they owned, they might as well describe each item and make sure they knew exactly where copies of it could be found. “With mate­ri­als on paper, hav­ing copies stored locally is a nec­es­sary (though not a suf­fi­cient) con­di­tion for con­ve­nient access. With elec­tronic mate­ri­als, local stor­age may be desir­able but is no longer nec­es­sary.… The answer is to shift from cat­a­logs to union cat­a­logs or linked cat­a­logs.… Arguably the present day cat­a­log… is more a prod­uct of the lim­i­ta­tions of nine­teenth cen­tury library tech­nol­ogy than of present day opportunities.”

    Between Ama­zon, Google, Library­Thing, World­Cat, and Open Library, we’re get­ting ever closer to set­ting aside nine­teenth cen­tury mod­els and to more fully tak­ing advan­tage of present day oppor­tu­ni­ties. There is no tech­no­log­i­cal rea­son pre­vent­ing us from build­ing a uni­ver­sal cat­a­log that con­tains infor­ma­tion on every book in exis­tence and locates that book in every library that has a copy avail­able for use.

    We’re also clos­ing in on hav­ing a dig­i­tal scan of every book, mak­ing full-text search­ing pos­si­ble, as well as con­cur­rent, remote use of scarce resources (by which I mean, I can look at the text of a book on my screen while you’re look­ing at it on yours, a fea­ture not avail­able in a paper-based book, which is lim­ited to being used in a sin­gle loca­tion and, gen­er­ally, by a sin­gle user). It’s an excit­ing time to be a booklover, and it gives one hope that, with bet­ter resources avail­able, books will begin to seem as acces­si­ble and vital as born-digital resources.

    I like the alter­na­tives that Ama­zon, Google, Library­Thing, World­Cat, and Open Library make avail­able. I think each has made the other bet­ter, and I like hav­ing alter­na­tives in research­ing books just as I like hav­ing FedEx, UPS, DHL, and the United States Postal Ser­vice avail­able when I’m try­ing to send a pack­age. I don’t think researchers are gen­er­ally lazy, and I don’t think they want fewer options. What they want are a few really good choices, and they have them. It’s excit­ing for all of us that these good choices seem intent on becom­ing great ones.

    Thanks to Tim Spald­ing and Aaron Swartz for read­ing an early draft of this arti­cle, and to my ItLwtLP col­league, Hilary Davis, for help­ing me with its final ver­sion.

    You might also be inter­ested in:

12 Comments

  • […] A Use­ful Ampli­fi­ca­tion of Records That Are Unavoid­ably Needed Any­way is an essay by Brett Bon­field which, dare I phrase it this way, use­fully ampli­fies sev­eral of the major web-based enti­ties which are inter­twined with libraries.  These include (but aren’t lim­ited to) OCLC’s World­Cat, Ama­zon, and Library­Thing.  Brett clearly under­stands libraries, and does a great job detail­ing the inter­re­la­tion­ships between all involved. […]

  • Hilary Davis says:

    Brett — thanks for your thor­ough analy­sis of these web­sites ser­vices that seem to have rec­og­nized the value of books and seized oppor­tu­ni­ties to cap­i­tal­ize on their value in a way that most libraries haven’t (even though libraries have been in the busi­ness for a very long time). Why haven’t most libraries taken steps like these? Is it because libraries aren’t nec­es­sar­ily accus­tomed to hav­ing com­peti­tors until the last 10 – 13 years or so? Is it because libraries haven’t tra­di­tion­ally thought of their users as cus­tomers or clients? You point out that com­pa­nies like Costco and Ama­zon enjoy a very high mar­gin of cus­tomer loy­alty — do libraries have user loy­alty? How does the con­cept of loy­alty play out in the frame­work of libraries as a pub­lic good (except­ing that there are libraries that are pri­vate)? Are libraries not posi­tioned to be com­pet­i­tive in the cur­rent mar­ket­place (or do they even need to be competitive)?

    It sounds like that what you’re say­ing is that these ser­vices are com­ple­men­tary to the port­fo­lio of other ser­vices that libraries pro­vide. So, why shouldn’t libraries har­ness the work that these ser­vices do to pro­mote books and give users the infor­ma­tion they seek from books, and lever­age these ser­vices to point to the addi­tional ser­vices that libraries offer (the world​cat​.org model or the Open Library model)? Are there any rea­sons not to move in this direction?

  • […] Google, Library­Thing, World­Cat, and Open Library…11.19.08 19 11 2008 Posted [http://​inthe​li​brary​with​the​lead​pipe​.org/​2​0​0​8​/​a​-​u​s​e​f​u​l​-​a​m​p​l​i​f​i​c​a​t​i​o​n​-​o​f​-​r​e​c​o​r​d​s​-​t​h​a​t​-​a​r​e​-​u​n​a​v​o​i​d​a​b​l​y-n…] on the In The Library With a Lead Pipe blog is an inter­est­ing overview of Ama­zon, Google, […]

  • Tom Cole says:

    Are there any rea­sons not to move in this direction?” — Well, it depends what direc­tion you mean — uni­ver­sal cat­a­log or uni­ver­sal access. Open Library and Google Books aspire to uni­ver­sal access — dig­i­tiz­ing the con­tent of books, thus ensur­ing their abail­abil­ity to future gen­er­a­tions. Along the way they might pro­duce uni­ver­sal cat­a­logs, too. The other ser­vices come close to being uni­ver­sal cat­a­logs, bring­ing together as many records as they can but not touch­ing the con­tent (though Ama­zon has excerpts).

    What can libraries do? The pub­lic library where I am employed has used tags from Library­Thing in its online cat­a­log and makes World­Cat avail­able through its web­site. Peo­ple usu­ally want the book in hand, how­ever, rather than the con­so­la­tion of know­ing that it’s out there some­where or that they might like another book sim­i­lar to the one they asked about.

    I think libraries could do more in using freely avail­able dig­i­tal texts as sec­ondary resources. I’ve shown stu­dents Project Guten­berg, for instance, when the last copy of some clas­sic text is checked out. (I see Project Guten­berg is now listed on the Inter­net Archive as one of its resources.) Google Books and Open Library could and should be used to broaden our avail­able offer­ings. Librar­i­ans fear, per­haps, that texts out­side brick-and-mortar libraries are uncon­trolled, unau­then­ti­cated, and evanes­cent. But they’re there and they fill a need.

  • Why haven’t most libraries taken steps like these? Is it because libraries aren’t nec­es­sar­ily accus­tomed to hav­ing com­peti­tors until the last 10 – 13 years or so? Is it because libraries haven’t tra­di­tion­ally thought of their users as cus­tomers or clients?

    I live in a town in which every­one is will­ing to pool their resources for cer­tain pur­chases. We know we want to have access to a broad range of items, so we agree to buy them col­lec­tively and take turns using the goods we pur­chase, which we gen­er­ally get at a dis­count. We real­ize that, rather than buy­ing these things indi­vid­u­ally, we can cre­ate one enor­mous bud­get and buy a lot more stuff. For the most part, we buy books and DVDs and CDs. Oper­a­tions are cen­tral­ized through an insti­tu­tion we refer to as the Library.

    Libraries are really good at help­ing other peo­ple pool their resources, but we’re ter­ri­ble at pool­ing our own. Our var­i­ous lit­tle con­sor­tia are great, as long as the alter­na­tive is going it alone, but they’re a drop in the bucket when the alter­na­tive is a cen­trally man­aged insti­tu­tion like Google or OCLC or Ama­zon. We need to look at how much all libraries are spend­ing on all inven­tory sys­tems, cat­a­logs, cat­a­loging (includ­ing seri­als), scan­ning, etc. And then we need to make sen­si­ble use of our col­lec­tive resources. It’s not like we have excess capac­ity – if any­thing, we need more cat­a­logers and more pro­gram­mers. But we need to get rid of all the dupli­ca­tion and waste (also known as ven­dor markup).

    You point out that com­pa­nies like Costco and Ama­zon enjoy a very high mar­gin of cus­tomer loy­alty — do libraries have user loy­alty?

    Yes.

    How does the con­cept of loy­alty play out in the frame­work of libraries as a pub­lic good (except­ing that there are libraries that are private)?

    There’s this library at a state school in the Amer­i­can South­east. The folks there think the library can be one of their school’s com­pet­i­tive advan­tages. That is, they think hav­ing a great library can help them recruit stu­dents and retain fac­ulty members.

    That’s one of my goals for the Collingswood Pub­lic Library. I want it to be one of the first things local real estate agents talk about when they’re sell­ing our town to poten­tial res­i­dents. And I want it to be a rea­son that peo­ple choose to stay here after they retire.

    Are libraries not posi­tioned to be com­pet­i­tive in the cur­rent mar­ket­place (or do they even need to be competitive)?

    Our neigh­bor­ing town appears poised to close its library in the next month or so. And we’re three miles away from Philadel­phia, which is clos­ing a third of its branches. Of course we need to be more competitive.

    It sounds like that what you’re say­ing is that these ser­vices are com­ple­men­tary to the port­fo­lio of other ser­vices that libraries pro­vide. So, why shouldn’t libraries har­ness the work that these ser­vices do to pro­mote books and give users the infor­ma­tion they seek from books, and lever­age these ser­vices to point to the addi­tional ser­vices that libraries offer (the world​cat​.org model or the Open Library model)? Are there any rea­sons not to move in this direction?

    I don’t think there are any legit­i­mate rea­sons not to work as closely as pos­si­ble with every ser­vice that gives peo­ple access to good infor­ma­tion. The ques­tion is, what are we will­ing to offer these ser­vices in exchange for their assistance?

  • Hilary Davis says:

    Brett — thanks for respond­ing to my ques­tions — most of which were asked sim­ply to play devil’s advo­cate and solicit some alter­nate per­spec­tives on these broader issues.

    How­ever, I think you hit the nail on the head when you ask “what are we will­ing to offer these ser­vices in exchange for their assistance?”

  • I try never to leave a lead­ing ques­tion unanswered.

    So… what should we be will­ing to offer these ser­vices in exchange for their assistance?

  • Ben Abrahamse says:

    I enjoyed this arti­cle very much.

    One com­ment, you said about Worldcat:

    One of the two major prob­lems with World​Cat​.org is what it doesn’t include: the long tail of library records.

    At the same time OCLC’s data­base doesn’t include a lot of uncom­mon works, it also includes dupli­cate records of many com­mon ones, mak­ing it chal­leng­ing to find things. Their pen­chant for mass load­ing records willy nilly from non-English libraries doesn’t help either.

    Bot­tom line, the data­base is so big that it is dif­fi­cult to nav­i­gate, even with the inter­face tools like “faceted” browsing.

    I don’t work a ref­er­ence desk, but my feel­ing as a “back room” librar­ian is that OCLC is a great resource for locat­ing “known items” but it’s not the great­est dis­cov­ery tool out there.

  • Instead, World​Cat​.org is dis­play­ing the ten clos­est libraries that share their records with WorldCat.

    Actu­ally, it’s worse than that.

    A library could be a dues-paying mem­ber of OCLC for years and years. It could con­tribute hold­ings data to OCLC. It could con­tribute orig­i­nal records. It could par­tic­i­pate in inter­li­brary loan. You could do all of this and your records still wouldn’t show up in the ver­sion of World­cat that the pub­lic sees on the open Web. (They would show up in the ver­sion that your tech­ni­cal ser­vices staff pays to use, however.)

    This is because OCLC demands that you sub­scribe to First Search in order for your records to show up on the pub­lic World­cat. It doesn’t mat­ter if your library doesn’t have the dol­lars or the need for First Search. No First Search, no pub­lic dis­play of your hold­ings — that’s the rule.

    Peo­ple have com­plained about this at mem­ber­ship meet­ings, and OCLC offi­cials just say they gotta pay for World­cat some­how. Mean­while lots of unique items at spe­cial, school, and small aca­d­e­mic libraries show up on World­cat with few, or no holdings.

  • Heather McLeland-Wieser says:

    Great post and very thought pro­vok­ing. One thing that dis­turbs me with dig­i­tal books (and also online news­pa­pers and mag­a­zines.) For most of my patrons the dig­i­tal books inter­ac­tion goes some­thing like
    1) Find the very phrase you seek via a search engine
    2) Clip/save/print that page and that page alone
    3) Con­sider the source researched and cited

    The miss­ing piece of this process is the con­text within the larger writ­ten work. I’ve seen instances where a patron has done this and in tak­ing the page out of con­text has com­pletely mis­rep­re­sented that author’s point of view or argument.

    I love the idea of dig­i­tal books but I worry about the unin­tended consequences

  • In high school I had this, um, friend who would:

    1) Find the very term I, I mean HE was seek­ing via a book’s index (or pos­si­bly Reader’s Guide)
    2) Pho­to­copy that page and that page alone (and then I’d cut the context-less quote out of the pho­to­copy and paste it onto a 3 x 5 card)
    3) Con­sider the source researched and cited

    Not only did he… okay, I fre­quently mis­rep­re­sent the author’s point of view, I enjoyed mis­rep­re­sent­ing the author’s point of view. That was actu­ally one of the only things I liked about the dead-boring research papers I was assigned in high school.

    For what it’s worth, at least in the­ory it’s now far eas­ier for teach­ers to catch stu­dents doing the stuff I did. Back then, they would have had to go to the library and find my sources. Now it’s just a ques­tion of pulling up a URL, which is much, much faster. And, while I’ve never heard of a teacher doing this (I’m not say­ing they don’t, just that I don’t get to inter­act that closely with teach­ers), I have noticed that in Usenet dis­cus­sions and in dis­cus­sion forums, peo­ple have called BS on those folks who try to mis­rep­re­sent an author, pro­vided the source is avail­able online. I haven’t seen any­one call BS when the source is only avail­able on dead trees.

    Not that hav­ing elec­tronic resources instantly avail­able is a panacea – I’ve seen the stuff you’ve seen, in which peo­ple abuse dig­i­tal sources – but the old days weren’t any bet­ter, at least not as I remem­ber them. Will the future be any bet­ter? I think it will.

  • Ellie says:

    Thanks for that story Brett — I was think­ing the same thing.

Subscribe to comments for this post:

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Powered by WordPress | Original Theme by mg12 Edited by Derik. | Valid XHTML 1.1 and CSS 3