17 November 2006

Tags and Subject Headings in LibraryThing

When I finished library school in August, I put all of my papers in some boxes and haven’t looked at them since, because I really needed to recover. Happily, yesterday’s post about folksonomy finally forced me to dredge them out and bring to light a paper I did for my indexing and abstracting class on the use of a folksonomy alongside a controlled vocabulary in LibraryThing.

The first part is a sort of “literature review” on folksonomy (such as it is) and an overview of the concepts involved. The second part takes a look at LibraryThing and compares tags and subject headings.

The full text of the paper follows, or you can download a pdf for printing (warning: it’s in ugly, formatted-to-turn-in-for-class format; one of these days I’ll get it prettied up). Some discussion of the features of LibraryThing are slightly out of date (Tim & co. move fast!), and the statistics about popular tags and subject headings certainly are, but I think the main points are still relevant. I’d like to do some more in-depth analysis, especially of the statistical data, at some point in the future.

Read the rest of this entry »

16 November 2006

Philosophical Misunderstandings about Folksonomy?

Beneath the Metadata: Some Philosophical Problems with Folksonomy by Elaine Peterson appears in this month’s D-Lib. It’s an interesting piece, but I have a couple of quibbles.

Statistics and Democracy

Peterson’s main point is that folksonomy is philosophically relativistic about what something “really” means, compared to a controlled vocabulary employed by a professional cataloger. She writes,

A philosophy of relativism allows folksonomy to draw on many users with various perceptions to classify a document instead of relying on one individual cataloger to set the index terms for that item. Thus, classification terms become relative to each user. Certainly all individuals’ perceptions are influenced by their own experiences and cultures, whereas the professional cataloger, even if trying to be unbiased, has only one viewpoint. Yet to include all viewpoints opens up a classification scheme to the inconsistency that allows a work to be both about A and not about A. There is no question that an individual might have a personal, valid interpretation of a text. That is not the issue. The issue is that adding enough of those individual interpretations through tags can lead to inconsistencies within the classification scheme itself.

This seems to envision systems in which there are a handful of personal folksonomies, all on an equal playing field and therefore leading to a plurality of interpretations, and concludes that a handful is too many because it’s more than one. On the contrary, I would insist that a handful is too few. With much larger numbers of users, it becomes clearer which are commonly held viewpoints and which are fringe ones, simply through the popularity of different tags. A consensus emerges through statistics, without explicitly coordinating users.

What are the magic numbers for users and tags that make a folksonomy successful in this way? I have no idea, but it would be an interesting experiment. You could take random samples of items from del.icio.us or flickr or wherever, along with random samples of users. You’d have to present the popular tags for items, varying the number of users’ tags included, and ask people to independently assess the suitability of the popular tags for describing the item. (It would take a lot of human trials, which is why I’m not exactly jumping on this one.)

Weeding

As an aside from her central argument, Peterson also makes this curious assertion:

A final criticism one could make of folksonomies as classification systems is that their advocates seem to assume everything on the Internet needs to be organized and classified. Anyone who has a home library knows that this is not necessarily true. Everyday, individuals make critical assessments of information bits they encounter. Their first decision is whether or not to retain the information, and if so, how to organize it. Folksonomy advocates seem not to recognize that critical, first decision about retention. The free labor available to create folksonomies is appealing only to those who have already agreed that the entire Internet needs some organization and cataloging. However, rather than being retained and organized, many Internet items could be eliminated, ignored, or allowed to die off. Most people put into the wastebasket (physically or online) flyers, ads and newsletters, and would not bother to organize ephemera.

Do you bookmark every webpage you visit, or every photo you see on flickr, or whatever? I sure don’t. I only bookmark the things worth retaining to me. I have to assume that, if something is bookmarked, it was important to somebody. Now, I suppose I can imagine a cadre of people out there—who were probably catalogers in a former lifetime—who sit down for hours at a time surfing web pages just to tag them. These people are decidedly in the minority. Retention isn’t just a part of folksonomy, it’s the primary motivation for regular users to engage in the “free labor” of organization at all.

Now, just because something was important to me doesn’t mean it’s “important” in general. But folksonomy is a decentralized effort, so it’s vital to realize that, just as the system does not involve a single cataloger, it does not involve a single collection development librarian, with all the attendant advantages and limitations of that approach. However, the lack of a cataloger doesn’t mean there’s no organization, and likewise the lack of a collection development librarian doesn’t mean there’s no selection and weeding. Statistics and popularity are again our guide: if only a few people bookmark it, it obviously hasn’t been considered as important or interesting as if many people do so.

6 November 2006

What I Do

I realized I’ve never really talked much about what I do and where I work on this blog.

I’m an Information Architect* at a consulting firm in Pittsburgh that employs about 20 people. Our specialty is streamlining business processes that involve information, including things like policies and procedures, product documentation, and portal design. The company has been around since 1989, when it started as a technical writing firm. In the 90’s with the advent of knowledge management, the company got involved with wider issues around organizing information, such as portals and intranets.

I’ve worked here since 2003, and my job duties involve all parts of the information life cycle. On some projects, I’m essentially a technical writer: I gather source materials, interview subject matter experts, and write documents like procedures or user manuals. I’m also the “techie guy”, and much of my work involves designing efficient publishing processes that can take advantage of reusable information and publish it in many formats. Much of our documentation is based on XML formats that can be flexibly repurposed into, e.g., a print manual, online help, training slides, web-based interactive training, and so on. And finally, we help businesses organize and manage all this information as well, helping them understand their needs for content management systems, portals, intranets, etc., and how to integrate those tools into their ways of doing business.

So that’s what I do. It takes a broad range of skills, from technical writing, to indexing and classification, to thinking about usability and accessibility, to markup and scripting, to systems administration. I can’t say I’m especially good at many of those, but I muddle through somehow. :)

*I’m still not sure I really know what that means, but it looks pretty on my business card.

Safari Books Online

Sarah Houghton-Jan at Librarian in Black wrote about problems using Safari Books Online in a library setting. I left a comment saying that this was unfortunate, because we have Safari at work and I really like it. (Safari is a service from O’Reilly and Pearson offering e-books from a number of publishers, mostly on computers and technology.)

This prompted the following note:

Hello, I read your comment on librarianinblack blog about Safari. You mentioned that you appreciated it and thought there might be a problem with the Proquest Interface. How does your library get Safari? Directly from Safari, then? And, it is not a proquest subscription purchase?

Thanks, -DN Dussan (left in comments on another post)

Well, I don’t have a “library” per se; I work at a small consulting firm with a handful of people, so we just have individual accounts, which are available directly from Safari (safari.oreilly.com). At www.safaribooksonline.com, I noticed that there’s some information on corporate licensing and libraries, although they’re pretty silent about the fact that library access comes through ProQuest. As it turns out, ProQuest has a deal for exclusive distribution of Safari content to academic, public, and school libraries.

So, I guess the only recourse for dissatisfaction with ProQuest’s interface is to appeal to ProQuest. Then again, it can’t hurt to contact Safari directly with your concerns as well, because from what I’ve seen, O’Reilly is pretty committed to offering a useful, usable product. They seem to be quite interested in the academic market, because they also offer SafariU, a service for professors to remix and mashup books to create the ideal coursepack/textbook for their courses.