Will the Real Emily Please Stand Up
This is the first in a series that I’ll will be writing about personal information management.
By Derik Badman
While personal information is often thought of as only the documents, emails, and other pieces of information that people receive or retain for some potential, immediate, or future need, William Jones, in his Keeping Found Things Found: The Study and Practice of Personal Information Management (Morgan Kaufmann, 2008), expands the field to include information about “me” or owned by “me.” As our online identities, information about us online, expand, how can we manage that information to put our best face (identity) forward?
My online life has proliferated greatly over the past years. What once was an email account that I accessed through a text-based interface and was primarily a way to communicate with a few friends I knew “in real life” has become multiple blogs, websites, social networks, comments, micro-blogs, status updates, photos, drawings, links, presentations, and more. Each one is a new profile and another place people might find my content (or at least something about me).
Perhaps I’m lucky in that I have an unusual name. I can Google myself and go through 100 results without finding a hit that isn’t me (also an indication of how active my online life is, I guess). Take a look at the first page of results on my name in quotes. On examination the results reveal a few points of interest. Seven out of ten feature my name in the page’s title. Six of the ten have my name in the URL. The top result is my home page, which has my name in neither title nor url, but is in the header of the page and the metadata and probably a lot of links go there referencing my name. The remaining result is my profile on my library’s LibGuides, which simply features my name 10 times in a very small amount of text.
When you are trying to find out about someone online, a lot is based on names in titles and urls and links from one site to another. That is no problem if you have a rare name (at least, rare online) and your name ends up in titles of pages or urls. But many services don’t put names in page titles; maybe they just put your username, which is often not your name or even close to your name. Maybe your name only appears once on a page, but it’s a single important occurrence (table of contents of a book, praise in a newspaper article or blog post). How will people find you (if you want to be found)? How will people find what you want them to find (the good stuff)?
Even more problematic is the case of someone with a name that is shared by others who also have some kind of online life. Confusion can easily occur to the casual searcher. I have a friend who has not only someone with the same name as her but who is also in the same profession. This lack of difference in basic identifiers (same name, same occupation) can increase the chances of confusion and a mingling of multiple identities. In the case of my ITLWTLP colleague Emily Ford, a Google search on her name does bring up her author page at this site (result nine). But it also brings up Emily Ford the author of a book from the “Erotic Print Society”, Emily Ford the marketing manager, Emily Ford who posts about restaurants in the San Francisco area, Emily Ford the rowing coach at Oregon State, as well as Emily’s in Alaska, Massachusetts, and North Carolina. Some of these might be the same Emily–does author Emily live in San Franscisco or coach rowing, did Emily from North Carolina move to Massachusetts–, but they might also all be different.
In a world full of Emily’s, how does our Emily associate herself with her content and not someone else’s? If she is looking for a job and someone Googles her name, the Googler should easily realize that these are not all the same Emily, but he or she may make wrong assumptions about certain content. He may think ITLWTLP Emily also writes erotic fiction–our Emily may not want people thinking that. He may miss positive content that Emily would want others to see. There are no easy answers to these problems, but a number of sites and tools have been made to help aggregate online identity and make connections between the sites you want people to see.
ClaimID, created by two doctoral students at UNC’s School of Information and Library Science, was designed as an online identity manager. You create an account to manage your ClaimID page (you can take a look at mine) which serves as a kind of home page. For the many people without a homepage, who use Facebook or Twitter or Blogger as their primary online home, the ClaimID page offers a single page to aggregate your personal information. This page is used as a repository for your links: links about you, links by you, as well as links that people might think are about you (so you can say “not mine!”). By putting your name in the URL, the title, and the metadata, ClaimID hopes to make your page rise in search results. By linking back to your ClaimID page from your various online profiles, you can help raise that profile (most search results use inbound links to increase relevancy). With ClaimID you can stake a claim to your online identity, tweaking it as you see fit by adding or not adding content.
ClaimID makes use of a few emerging (already emerged?) tools for making social/identity connections online. Of relevance to gathering personal information about yourself is “rel=me”. This is a subset of the XFN (XHTML Friends Network) microformat. A microformat, as succinctly described by Microformats.org, is “designed for humans first and machines second, …a set of simple, open data formats built upon existing and widely adopted standards.” Basically, it is a way to embed data that is both human and machine readable. XFN is used to embed social relationships into links. If I have a link on my blog to another blog, I can use XFN to note that the owner of that other blog is a “friend” or a “colleague” or even that I have “met” them. These relationships are marked using the “rel” (short for relationship) attribute of link tags in HTML. If a normal link to my friend’s blog looks like this:
<a href="http://someblog.com">A Blog I Read</a>
A link to my friend’s blog, where I am using XFN, may look like this:
<a rel="friend" href="http://someblog.com">Blog I Read</a>
This would allow XFN aware applications to know that I consider the owner of the “Blog I Read” to be a friend.
For online identity, the value of “me” can be used for the “rel” link attribute to represent a link to another page about/by the same person. The use of “rel=me” is already in place at a number of web applications and sites. Take a look at the html of any Flickr profile page where the user has filled in the “homepage” field, and you will find a “rel=me” link. Similarly, the “Web” field of a Twitter user’s profile is also linked with “rel=me”. These links make an assertion of identity between two sites. Ideally, a user would be able to create reciprocal “rel=me” links to loosely verify the relationship. In other words, if my Twitter page has a “rel=me” link to my home page, my home page would also have a “rel=me” link to my Twitter page. This reciprocal linking shows that I have control over both sites, thus verifying their connection as “me.” Verifying who “me” is, is another problem all together. If someone else made a Twitter account and linked as “rel=me” to my home page, there would be an ambiguous relationship, because I, in control of the home page, would not link back to someone else’s Twitter account with “rel=me.” These links are not about establishing who someone is, rather they are about relationships: saying that this account and that account are the same entity.
These links are what would be considered “semantic links”. The idea of semantic links is that the link itself helps explain the relationship between the two ends of the link (in this case, people relationships, but it could also be other types of relationships). Of particular importance is the ability of machines (other computers) to “read” these links and understand the relationship.
Google’s Social Graph API is one application designed to read these semantic links using the “rel” attribute and XFN. The Social Graph is not yet widely used, but it points to the potential for these types of tools. For the time being, it’s interesting to use the Social Graph to see how one’s own accounts and sites are connected. One can peruse a few example applications of the Social Graph including the Site Connectivity demo. ((The use of other “rel” values can be used by this system to works towards a “distributed social network.” This article by Ben Ward is a good introduction to the topic.))
You can type one or many websites into the Site Connectivity box and it will track down “rel=me” links, both one way and reciprocal. See the results for my homepage here. The results show how my various sites/accounts connect to each other through “rel=me” links. The first section, “Info on Your Connected Sites” shows sites connected by “rel=me”” links. The sites in the left column are those linked from my homepage. The right column indicates the strength of the connection. Sites at the top of the table with green numbers are reciprocally linked with “rel=me”. I’ll admit it’s not all completely clear to me. The “Possible Connections” sections show sites that link to one or more of my other sites. I’m not sure why my home page appears down here. I do know that a few of my ITLWTLP colleague’s Twitter accounts appear in this section because they have this site in the “Web” field of their accounts. Because I am also pointing at the same page, the Social Graph thinks we might be the same entity.
You’ll note that I’ve actually got a number of reciprocally connected sites and profiles. This is primarily because my home page links out to a number of my profiles with a “rel=me” link. When those sites (Twitter, Flickr, etc.) are also using “rel=me” a reciprocal relationship is created.
Even if you don’t have your own home page (or a ClaimID page) linking out to all these services, you can create a networked identity by linking your profiles to each other. Tools able to identify and follow these semantic links can follow chains of links to create an aggregate. If you put someone’s Twitter URL into the Site Connectivity demo at Google, you can often immediately find that user’s profile in various blog platforms, Flickr, Friendfeed, Technorati, and other sites. Try it for yourself. If we put Aaron Schmidt’s Walking Paper website URL into the search we end up with a list of potential connections to FriendFeed, Flickr, Yelp, Technorati, and LastFM ((I should note that none of this is private information. It’s all based on publicly viewable links.)).
Do these connections, publicly discoverable as they are, offer a threat to privacy? I would say, “no.” The type of information tracked by the Social Graph API, all of these links, are based on information published by the people in the links. I choose what I put into my Twitter account web field. I choose what goes into my Flickr profile and all my other accounts. If I don’t want these pieces of information to be discovered, I shouldn’t make them public or at least not link them together.
By choosing which connections you create and how you label them, you can create multiple online identities. If I had wanted to, in the past, keep my librarian self separate from my comics blogging/drawing self, I could have made a concerted effort to not link those profiles, to use separate home pages, separate usernames, separate commenting identities. It might not have been a perfect separation, but it would have helped a lot in separating the two identities. In my case, those both becoming ever more public identities, it does not behoove me to do this. I want the connections to be made.
One should be aware that any information that gets out there publicly could be put to use by sites and users outside the original context. This gets done already; think about the white pages that spread across the internet and the sites that aggregate them. In the past I’ve found sites that list my last four or five phone numbers and addresses all at once. This is being done with our social networking information. Going through the Google results on my own name (above), I found results at the site Delver. There were two pages for “Derik Badman.” One seems to be drawing on my Flickr information (and links to my now deleted MySpace account), showing my profile, connections, and thumbnails from that site. The second seems to be drawing a limited amount of information from Facebook, including a small subset of my friends (perhaps those that had public profiles?). Why are these two profiles separate? As far as I can tell, this is because my public Facebook page had no links to any of my other accounts, thus keeping the site from making the connection between the two.
So what does all this matter to librarians, that is, beyond a personal interest in their own online identities? Information literacy, educating our patrons, is more than just about finding, using, and evaluating information made by others; it is also about our own information and our own personal information space. And personal information is more than just the stuff we keep on our computers. Personal information is also the information about us that others might find, use, and evaluate. If we are aware of these issues, we can advise patrons to manage their online identities. Tools that automatically create these connections will only increase over time. People should be aware of their online identities as they look for jobs (don’t you Google candidates?), while those with a public personality, or who want to have a public personality need to be even more careful of the identity they put forward. For self-promotional reasons, being able to manage and connect all one’s various profiles, content, and networks can aid in creating exposure for creative endeavors. Younger users can benefit greatly from being aware of these issues before their online identities proliferate, allowing a great control of these identities from the start. Some of us are stuck with what’s out there, and all we can do is manage the results.
Thanks to: Emily Ford for comments and the use of her name, Ellie Collier and Hilary Davis for comments, and Lianne Hartman for editing, comments, and the title. (Edit: And thanks to Lianne for noticing my typo in the title, which you can still see in the url.)
Derik, this is a thoughtful and enjoyable post, and I look forward to seeing where you take the theme in your follow-up posts. Here are a few questions and comments:
* Is this need to manage identity unique to twenty-first online life? Have non-famous people had to worry about stuff like this before?
* What opportunities do you think librarians will have to really work with patrons on these kinds of images?
* Name confusion can be fun. I’m Twitter friends with my favorite googleganger, a London bass player named Steve Lawson (of course), and I sometimes pass along direct messages or replies that came to me that are meant for him. And I believe there was a “League of Steve Lawson” group on Facebook for a while.
* I think about the validity and verifiability of identity online when a “famous” person comments on my blog. Aside from checking the IP address from where the comment was posted and emailing an externally-verified email address for that person, there’s no good way to know if it really is Michael Gorman descending from Mt. Olympus to comment.
* I have very little confidence in this semantic web stuff. And I don’t know that ClaimID or microformats or whatever are going to be much help to the casual web searcher trying to determine which Emily he’s found. That could just be my lack of imagination.
* Lastly, I was happy to be present at an Internet Librarian preconference session when Iris Jastram talked about a similar topic. I think Iris has written about identity management too, but you could do worse that to read Jenica Rogers-Urbanek’s notes on Iris’s talk.
Thanks for the comments, Steve.
*I’d imagine that managing identity was a different problem pre-internet. It was all about the personal relationships, the gossip, the “what are people saying about me.” That operated in a limited sphere, that has exploded with increasing online visibility, search engines, etc.
*I’m honestly not sure what the opportunities are for most librarians. I think, from an academic standpoint, there is something very important to be said for scholars/faculty to consider the image found of them online and how they might best gather their content and reactions to their content so as to best display it to potential colleagues, students, deans, media (to be one of those “experts”).
*Love to term “googleganger”.
*Validity and “real” identity are a whole other bag of worms. Something that I hope to take up in another post.
*Like anything else these semantic links add a tool to the repertoire. In this case linking and online aggregation can help raise search results in relevancy. Having some kind of home page to aggregate your content and differentiate yourself can help people know who is who. Or at least help people know who you are not.
*Thanks for the reading suggestions. I’ll check them out.
I actually recently blogged about something related to this- someone created a second account for my boyfriend – same name/picture/geographical location…without his permission. We’re not certain who is behind it, but about 40 of his friends accepted this imposters friend request without question. That is scary!
I think we will see more and more that googlegangers are actually imposters – for various reasons – to spy on someone else, to amuse oneself, to ruins someones professional or personal life… but yeah, that’s another bag of worms as well.
another Emily: Wow, that is an interesting occurrence. I know there have been issues with celebrities (or semi-celebrities) and account confusion. This is where authentication comes up, which is a whole other topic, but one problem with authenticating anyone online is that it is all based on relative criteria: an email address, a url.
Recently I was trying to find out which library vendors/journals were on Youtube or slideshare. Lists exist for the same thing for Twitter and Facebook, but not Youtube or slideshare.
Naturally,I had the same idea as what was in this post, let’s see if I can find related sites using the twitter sites as a starting point. Why not use the Google graph API…
The other trick to check if X on network A is the same as Y on network B is to check lifestreaming aggregators (claimid probably counts as sort of one), the famous of which is friendfeed.
Would it surprise you to know that these methods weren’t too successful? The later was marginally better.
But what was the easiest method? Using http://namechk.com/ and similar services. This counts on the fact that people tend to use the same username across the web.
I’ve being very quick to sign up for web2.0 accounts so basically every account that allows only one aarontay is me (some accounts are me, but i lost the password!).
But this doesn’t solve the problem since there are multiple people wanting to own the same username. I noticed for example ebscohost having to settle for a variant of their name on youtube :)
Basically if library vendors are not using such technologies to identify themselves, chances are few are doing so.
I would add that libraries since 2006 began opening web accounts all over the place, while chances of a “name collision” are less, you would still need some means of identifying real accounts.
I’ve being blogging pieces like “libraries on twitter”, “libraries on friendfeed”, “libraries on google profiles” , while my latest post was trying to identify libraries on getsatisfaction, yelp, uservoice.
I would add finding if a certain library is on a certain service is not easy :)
I have used namechk.com and found my twit ID taken. but I have my .com!