• Not Just Another Pretty Picture

    November 11, 2009

    Not just another pretty picture

    Treemap of mammals courtesy of Flickr user Arenamontanus

    Treemap of mam­mals cour­tesy of Flickr user Arenamontanus

    Intro­duc­tion

    I’m a slave to spread­sheets. Try­ing to decide between a stacked col­umn bar chart and a 3-D area chart is par for the course in my work. Microsoft Excel© is great for many prac­ti­cal needs, but it doesn’t always sup­port the need to cre­ate sim­ple, com­pelling and inter­ac­tive graph­i­cal data visu­al­iza­tions that are crit­i­cal for libraries to best express value, com­mu­ni­cate trends, and test assump­tions about library ser­vices and col­lec­tions. Data visu­al­iza­tion is the study of strate­gies and meth­ods for con­vey­ing infor­ma­tion, as cap­tured by data, in an effi­cient, func­tional way that leads to insights about a process or sys­tem. Good data visu­al­iza­tion can drive home a point quickly and have lin­ger­ing impact. Data visu­al­iza­tions can help you see some­thing that you hadn’t noticed before. These days, libraries can’t afford to not be wise and impact­ful with the data that is col­lected and con­veyed about patrons, ser­vices and col­lec­tions. Many libraries are report­ing declines in ref­er­ence desk queries against the back­drop of mas­sive surges in use of com­put­ers and other tech-related ser­vices. Most libraries are under­go­ing com­pre­hen­sive reviews of jour­nal and data­base usage (among other met­rics) with the aim to cut col­lec­tions to com­ply with shrink­ing bud­gets. To express these kinds of trends, to seek sup­port, or to sim­ply try to assess library col­lec­tions and ser­vices, many libraries fall back on the use of tables with a few pie charts and bar graphs thrown in for added mea­sure. When I started hav­ing con­ver­sa­tions with my library col­leagues about data visu­al­iza­tion tools and tech­niques, I was hum­bled by what I didn’t know and embar­rassed that I hadn’t heard about, much less tested, some of the data visu­al­iza­tion tools that are sur­fac­ing. So, I decided to start explor­ing what I’ve been miss­ing while hid­ing behind the ubiq­uity of Microsoft Excel© graphs and charts. In this post, I present some exam­ples using a few pop­u­lar data visu­al­iza­tion tools and I give an overview of some inspi­ra­tional guides for cre­at­ing com­pelling data graph­ics that may help you bet­ter express your own library met­rics. First, let’s explore a lit­tle fur­ther why data visu­al­iza­tion mat­ters for libraries.

    Library data in context

    Libraries serve users at the ref­er­ence desk, cir­cu­la­tion desk, and spe­cial col­lec­tions cen­ters. Library staff engage with con­stituents through com­mit­tees and work­ing groups, at the library secu­rity gate, and through online chat. Librar­i­ans attempt to expose valu­able ser­vices and col­lec­tions via library cat­a­logs, carefully-crafted sub­ject guides, dur­ing bib­li­o­graphic instruc­tion ses­sions, and via long lists of data­bases and online jour­nals. Libraries assess usage and patrons needs via web sta­tis­tics, gate counts, cir­cu­la­tion trans­ac­tions, LibQual sur­veys, usage sta­tis­tics, and feed­back forums. Why do we mea­sure these expe­ri­ences? To show value for money or time and to under­stand the uptake of our col­lec­tions and ser­vices. Library value has been a pop­u­lar topic since at least the 1930s and libraries have got­ten bet­ter at show­ing return on invest­ment (ROI). We’re not com­pletely there yet, as the recent $1,000,000 IMLS spon­sor­ship of “Lib-value” grant sug­gests. Libraries are pretty adept at mea­sur­ing lots of dif­fer­ent kinds of inter­ac­tions, so how can we be so bad at demon­strat­ing our worth and mak­ing our point? What if part of our prob­lem in demon­strat­ing value lies in how we attempt to show­case library value? Libraries also want to make good, sound deci­sions in the con­text of their user com­mu­ni­ties. Libraries col­lect a lot of data that encom­pass com­plex net­works about how users nav­i­gate through online resources, which sub­jects cir­cu­late the most or the least, which resources are requested via inter­li­brary loan, vis­i­ta­tion pat­terns over peri­ods of time, ref­er­ence queries, and usage sta­tis­tics of online jour­nals and data­bases. Mak­ing sense of these com­plex net­works of use and need isn’t easy. But the rela­tion­ships between use and need pat­terns can help libraries make hard deci­sions (say, about which jour­nals to cut) and cre­ative deci­sions to improve user expe­ri­ences, out­reach, achieve effi­cien­cies, and enhance align­ment with orga­ni­za­tional goals.

    Not another library ROI arti­cle, please!

    Relax, this isn’t another post about how cal­cu­late library ROI nor is it about how to col­lect data that show library worth. This post is an explo­ration of visu­al­iza­tion tech­niques that can help libraries make a com­pelling case to stake­hold­ers and get insight about how data visu­al­iza­tion can help libraries make more informed deci­sions. Dis­claimers: I’m not an expert on visu­al­iza­tion tech­niques; I’m part of the slew of librar­i­ans who need to know how to bet­ter illus­trate what we do and learn how to bet­ter allo­cate resources. Visu­al­iza­tion strate­gies have made their debut at library con­fer­ences already (e.g., 2009 Com­put­ers in Libraries; 2008 Com­put­ers in Libraries; 2009 NASIG Con­fer­ence). How­ever, I haven’t seen a groundswell of exam­ples indi­cat­ing that libraries have taken these strate­gies and these con­fer­ence pre­sen­ta­tions to heart. What I have expe­ri­enced is a few really good ideas pop­ping up in con­ver­sa­tions with col­leagues about how to make the case for libraries in sim­ple, com­pelling, visual ways. I want to share what I’ve learned so far in my explo­ration and open the door to some more good ideas.

    Data needs to be “humanized”

    Dur­ing var­i­ous con­ver­sa­tions about how to rep­re­sent library col­lec­tions and expen­di­tures data, one of my very smart col­leagues, Cory Lown, intro­duced me to the work of John Tukey and Edward Tufte. Cory explained that Tufte’s aim is to encour­age the use of as much data as pos­si­ble (“to clar­ify, add detail”) and to use visu­al­iza­tion tech­niques that “fit” the data.

    Often bar charts and pie charts (which tend to have low data to ink den­sity) obscure more nuanced and inter­est­ing data. It’s not just about new and inter­est­ing tools, but match­ing the data to the right visu­al­iza­tion so we can make use of data we have.” (Lown, 2009, pers. comm.)

    This isn’t a triv­ial process by any means due to the unique­ness of each set of data due to vari­a­tion in meth­ods for col­lec­tion, data clean-up, analy­sis and so on. But, accord­ing to Tufte’s prin­ci­ples, focus­ing on giv­ing as much atten­tion to the data in a chart, graph or image (aka “max­i­miz­ing the data-ink ratio”) while reduc­ing the “fluff” (aka “chartjunk”) (e.g., chart bor­ders, text leg­ends, back­ground fill, dec­o­ra­tions) can aid in get­ting the point across.

    In the spirit of the work of Tukey and Tufte, a recent book, aptly named Beau­ti­ful Data (2009, edited by Toby Segaran and Jeff Ham­mer­bacher) brings together a great com­pi­la­tion of data visu­al­iza­tion, data han­dling and data sense-making strate­gies. In one chap­ter, Nathan Yau, also author of a ter­rific blog called Flow­ing­Data (to which I’ll refer a lit­tle later in this post), describes the devel­op­ment of a sim­ple, user-friendly tool to track and mea­sure what he calls “per­sonal data” (e.g., eat­ing, sleep­ing, travel habits). Yau is inter­ested in cre­at­ing tools for peo­ple to dis­till their per­sonal data into sto­ries that can help them under­stand pat­terns about their per­sonal habits and even­tu­ally help relate peo­ple to the big­ger pic­ture about their impact on their envi­ron­ment and vice versa. This con­cept of cre­at­ing a way for a per­son to relate to the big­ger pic­ture through data is an impor­tant les­son for libraries.

    Data has to be pre­sented in a way that is relate-able; it has to be human­ized.  Often­times we get caught up in sta­tis­ti­cal charts and graphs, which are extremely use­ful, but at the same time we want to engage users so that they stay interested…Users should under­stand that the data is about them and reflect the choices they make in their daily lives.” (Yau, 2009)

    All of those inter­ac­tions with patrons that libraries col­lect and track  — cir­cu­la­tions, jour­nal usage sta­tis­tics, cost/use met­rics, etc. — are about the patron. How­ever, most of the met­rics that libraries present to make the case to patrons, aren’t pre­sented in a way that relates the patron to the data. An exam­ple: Aca­d­e­mic libraries spend a lot of money on jour­nals. In fact, the NCSU Libraries spent around $6 mil­lion on jour­nals dur­ing 2008 – 2009, but how many of our patrons know that when they down­load a jour­nal arti­cle that it’s paid for by the NCSU Libraries? That $6 mil­lion dol­lars doesn’t nec­es­sar­ily “trans­late” to a user when they down­load an arti­cle. We tell them how many arti­cles were down­loaded, but is there a bet­ter way to make the con­nec­tion between the user and the cost of resources? For the most part, library met­rics aren’t good at telling sto­ries that keep our users inter­ested and help them inform the choices that they make. We are in need of some great ideas and exam­ples from the field.

    Sim­ple data visu­al­iza­tion tools

    While sev­eral other visu­al­iza­tion tools exist, I want to focus on three of the most pop­u­lar tools and demon­strate what is pos­si­ble using a few datasets that I’ve cre­ated using the kind of library met­rics that you might be deal­ing with in your own library. After try­ing a few dif­fer­ent types of library met­ric datasets in Google Gad­gets for spread­sheets, ManyEyes and Swivel, my favorites are Google Gad­gets and ManyEyes because of their ease of use and diver­sity of visu­al­iza­tion styles.

    Google Gad­gets: First, I have to give props to Cory Lown for mak­ing me aware of Google Gad­gets for spread­sheets. It’s really quite sim­ple to use. If you have a Google account (e.g., Gmail), then you can use Google Gad­gets. Log into your Google account, choose Google Doc­u­ments, Cre­ate New Spread­sheet, then add your data (just as you would in Excel). Once your data is ready, go to the Insert menu and choose Gad­get. As of the time of this writ­ing, there are over 35 dif­fer­ent visu­al­iza­tions you can choose from: every­thing from the stan­dard bar charts to motion graphs to piles of money. The upside: You can exper­i­ment with the dif­fer­ent visu­al­iza­tions and pick one that fits the point that you’re try­ing to make or the audi­ence that you’re try­ing to reach. You can share your visu­al­iza­tions with a sim­ple URL that you plug into an email, or into your web­site or blog. The down­side: You don’t have a lot of con­trol over font size or posi­tion­ing of ele­ments on the charts.

    Google Gad­get Motion Charts are excel­lent for show­ing change in val­ues over time. They are the pri­mary visu­al­iza­tion mode for sites like Gap­Min­der to illus­trate changes in global issues over time. Below is an exam­ple using data that I col­lected from the data from the Asso­ci­a­tion of Research Libraries on research library expen­di­tures plot­ted against uni­ver­sity expen­di­tures span­ning from 1982 through 2006. Try the motion chart with the default vari­ables, then try chang­ing them. You’re wel­come to access the dataset itself to cre­ate your own data visu­al­iza­tion.

    ManyEyes: As part of the agree­ment to use ManyEyes with your own data, any data you upload is made pub­licly and freely avail­able for oth­ers to use. After sign­ing up with an email address, the process is easy and straight­for­ward. You can be up and run­ning with sev­eral visu­al­iza­tions of your data within a few min­utes and you can share your visu­al­iza­tions with links in emails, or embed them in your web­site or blog. The upside: the choice of visu­al­iza­tions is pretty exten­sive: Word Tree, Phrase Net, Wor­dle, Tag Cloud, Bar Chart, Block His­togram, Bub­ble Chart, Net­work Dia­gram, Scat­ter­plot, Matrix Chart, Treemap for Com­par­isons, Treemap, Pie Chart, Coun­try Map, US County Map, World Map, Stock Graph, Line Graph, Stack Graph. The down­side: if you want to com­pare more than two vari­ables, you have lim­ited options. The exam­ple that I’ve included here is data that I col­lected on the pub­li­ca­tion and cita­tion pat­terns of NCSU schol­ars. Researchers at an aca­d­e­mic uni­ver­sity will almost always have more cita­tion activ­ity than pub­li­ca­tion activ­ity in a jour­nal. But just how much more? This visu­al­iza­tion illus­trates the scale of cita­tions for jour­nals in which NCSU schol­ars pub­lish 0 times, 2 times, 3 times, on up to 41 times. Try the visu­al­iza­tion below and exper­i­ment with the dataset to cre­ate other ManyEyes visu­al­iza­tions.

    Swivel: With Swivel, you have a choice to let the data that you upload be freely avail­able to oth­ers or to keep your data pri­vate. If you choose to keep your data pri­vate, be pre­pared to com­mit to a fee of $12/month. For most of us who use Excel to pre­pare data for upload into a tool like Swivel, an Excel tool­bar is avail­able from the Swivel Con­fec­tionary. The upside: You have a lit­tle more con­trol over things like font size and font face (com­pared to Google Gad­gets for spread­sheets); it’s just as easy to share data and visu­al­iza­tions (email or embed­ding in web­sites or blogs); and if you want your audi­ence to be able to inter­act with your charts, Swivel makes that a triv­ial process. The down­side: The choice of graphs is lim­ited (Bar, Line, Area, Stacked Bar, Stacked Area, Scat­ter, and Pie) and the site isn’t very respon­sive with larger sets of data (e.g., I tested it with a dataset of over 1900 rows and it had trou­ble switch­ing between dif­fer­ent types of graphs). In this exam­ple, I’ve uploaded a small dataset of usage of the major types of dig­i­tal col­lec­tions pro­vided by the NCSU Libraries. Try inter­act­ing with the pie chart and down­load the dataset if you want to use it to exper­i­ment with your own visu­al­iza­tions (you’ll need to cre­ate an account before you can do much with the data in Swivel).

    Visu­al­iza­tion inspiration

    There are some excel­lent resources that help pro­vide some insight into what is con­sid­ered good and bad data visu­al­iza­tion prac­tices. These sites are filled with exam­ples of inter­est­ing data visu­al­iza­tions to inspire your own work and in some cases (e.g., Gap­Min­der) also offer datasets with which to experiment.

    Flow­ing­Data: The Flow­ing­Data blog is one of the most com­pelling, idea-filled blogs I’ve come across — ever. Authored by Nathan Yau (UCLA PHD stu­dent in sta­tis­tics focus­ing on data visu­al­iza­tion), this blog high­lights great exam­ples of how to make a com­pelling point with data and visual cre­ativ­ity. Flow­ing­Data offers a great deal, but I want to point out 5 spe­cific
    visu­al­iza­tion categories:

    Not only does Yau col­lo­cate exam­ples of how to dis­play data to dif­fer­ent audi­ences, but he also pro­vides thought­ful analy­sis about why a visu­al­iza­tion is effec­tive (or not) and what could be improved about it.

    Visualization ad

    Visu­al­iza­tion ad

    Infos­thet­ics: Authored by Andrew Vande Moere (fac­ulty mem­ber of Archi­tec­ture, Design and Plan­ning at the Uni­ver­sity of Syd­ney in Aus­tralia), Infos­thet­ics acts much like the Flow­ing­Data blog, but tends to focus more on data as art. There’s over­lap between Infos­thet­ics and Flow­ing­Data, but you’ll find a slightly dif­fer­ent per­spec­tive in Infos­thet­ics — one that deals with data visu­al­iza­tion from the design and inter­ac­tion approach.

    Visual Com­plex­ity: Manuel Lima uses the Visual Com­plex­ity blog to bring together exam­ples and ideas around the study of the visu­al­iza­tion of com­plex net­works such as data from library sys­tems, the social web, bio­log­i­cal sys­tems, and trans­porta­tion patterns.

    3D Dewey Data Visualization

    3D Dewey Data Visualization

    His aim is to ana­lyze meth­ods for con­vey­ing the adage, “the whole is always more than the sum of its parts.” Cur­rently a Senior User Expe­ri­ence Designer at Nokia’s NextGen Soft­ware & Ser­vices, Lima pro­vides an indus­try per­spec­tive on the util­ity of net­works to dis­play information.

    Gap­Min­der: Gap­Min­der is an orga­ni­za­tion that runs a web­site for dis­play­ing trends in global issues such as poverty plot­ted against inequal­ity indices or oil con­sump­tion plot­ted against oil pro­duc­tion. Its main visu­al­iza­tions are based on Google Motion Charts, and have been fea­tured in the famous TED Talks.

    Infor­ma­tion Dash­boards: Infor­ma­tion dash­boards are user inter­faces that serve the need of pro­vid­ing crit­i­cal infor­ma­tion at a glance. A book aptly named Infor­ma­tion Dash­board Design (2006, by Stephen Few) promises to teach read­ers how to use graphs dis­crim­i­nately to enhance com­mu­ni­ca­tion. Some excel­lent exam­ples of infor­ma­tion dash­boards that might fit in library con­texts are the Indi­anapo­lis Museum of Art (IMA) Dash­board (thanks to Adri­enne Lai for shar­ing this site with me) and the Sprint Now Dash­board.

    Indianapolis Museum of Art Dashboard

    Indi­anapo­lis Museum of Art Dashboard

    The IMA Dash­board presents sim­ple, com­pelling data in a graph­i­cally aes­thetic way. It tells a vis­i­tor things like how many plants are in the gar­dens, how many vis­i­tors are at the IMA, how much energy is being con­sumed by the IMA, and the num­ber of active mem­ber­ships. Each wid­get win­dow leads to a lit­tle more infor­ma­tion about the IMA, draw­ing the vis­i­tor in to learn more with­out over­whelm­ing him/her with too many options or under­whelm­ing with too few avenues to explore. The Sprint Now Dash­board, on the other hand, cre­ates a slightly dif­fer­ent expe­ri­ence. There’s a lot going on that isn’t nec­es­sar­ily rel­e­vant here — from the creepy voice-over to the num­ber of eggs being pro­duced or the num­ber of peo­ple stuck in ele­va­tors — but the con­cept of sur­fac­ing this kind of real-time infor­ma­tion is compelling.

    The pos­si­bil­i­ties in libraries for these kinds of infor­ma­tion dash­boards are obvi­ous. An exter­nal audi­ence might find it help­ful to know which books are being checked out (sim­i­lar to Seat­tle Pub­lic Library dis­play of cir­cu­lat­ing mate­ri­als), real-time loca­tions of avail­able com­put­ers, how many jour­nal arti­cles are being down­loaded, how many e-books are being read, the num­ber of devices (e.g., lap­tops, ipods, Kin­dles) that are checked out, the lat­est arti­cles by your cam­pus researchers, upcom­ing com­mu­nity events, and maybe even an ROI met­ric on the value of library ser­vices and col­lec­tions per tuition dol­lar (or tax dol­lar) per hour. Tack on a cat­a­log search box, real-time web­cam views of the cof­fee shop wait line and the Info Com­mons, and you’ve got a mode for mak­ing a case for the value of library ser­vices and col­lec­tions while pro­vid­ing real-time infor­ma­tion all in one view.

    An inter­nal audi­ence of library staff and deci­sion mak­ers might find it help­ful to see in a dash­board view the “health” of the library bud­get, cost/use met­rics based on cir­cu­la­tion data or elec­tronic jour­nal or e-book usage sta­tis­tics, hourly gate counts, key­words searched in the cat­a­log, cat­a­loging activ­ity, and a cur­rent snap­shot of the com­po­si­tion and use of the col­lec­tion bro­ken down by for­mat or by mate­r­ial type plot­ted against com­mu­nity demo­graph­ics (e.g., num­ber of full-text jour­nal down­loads per grad­u­ate stu­dent in the Chem­i­cal Engi­neer­ing Depart­ment) among other things. Other than the Seat­tle Pub­lic Library, I am not aware of any libraries pre­sent­ing this kind of (more or less) real-time, dynamic infor­ma­tion dash­board to the pub­lic, but I sus­pect that any data dis­played for pub­lic con­sump­tion would require that personally-identifying infor­ma­tion be excluded.

    Final thoughts

    The ulti­mate goal of libraries is to help patrons make smart deci­sions about the infor­ma­tion they use and cre­ate. As an exten­sion of that goal, Jason Cas­den, one of the review­ers of this arti­cle noted that data visu­al­iza­tion tech­niques should be adopted to be part of a library’s orga­ni­za­tional cul­ture for assess­ment and jus­ti­fi­ca­tion to not only best serve patrons, but also to help guide the allo­ca­tion of lim­ited resources. Invest­ing in ways to lever­age the data that libraries col­lect to show value, com­mu­ni­cate trends, and test assump­tions about library ser­vices and col­lec­tions is part of the solu­tion for mak­ing the library be all about the patron. Try out some of the visu­al­iza­tion tools and sam­ple datasets used in this post or share your own data visu­al­iza­tion cre­ations via the Comments.

    Learn more

    Thanks

    Thanks to the peo­ple who’ve opened my eyes to the pos­si­bil­i­ties and who reviewed this post and offered valu­able feed­back: Cory Lown, Jason Cas­den, Brett Bon­field, and Kim Leeder.

    You might also be inter­ested in:

2 Comments

  • Ellie says:

    Thanks Hilary! My library will be doing a big stu­dent library and tech­nol­ogy use sur­vey in the Spring, so this is very timely for me.

  • I think this post has such rel­e­vance for not only librar­i­ans pre­sent­ing their pro­grams but also for the way we teach stu­dents about cre­at­ing data images and pre­sent­ing them for max­i­mum effect. Thank you for col­lat­ing all these ter­rific sources!

Subscribe to comments for this post:

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Powered by WordPress | Original Theme by mg12 Edited by Derik. | Valid XHTML 1.1 and CSS 3