15 May 2006

LibraryThing, tags, and subject headings

LibraryThing is now showing relationships between LC subject headings and user-assigned tags and the results are really interesting. Lots of forward-thinking librarians and web-2.0-ologists have been claiming for a while that subject headings and tags can coexist without eating each other’s babies, but we haven’t had any place to see it in action until now. (PennTags, for example, incorporates tags in the catalog, but doesn’t show relationships between tags and subject headings.)

It’s really pretty cool how much information can be derived simply by observing the co-occurrence of tags and subject headings, without any directed human input matching them up.

I think RJO’s comment on this post (the second one) is pretty insightful. I’ve been noodling around with LibraryThing and listening in on the LibraryThing Google Group, and one of the things that occurred to me as well was that it might be useful to have private tags, for things like shelf location and read/unread status, in addition to public tags, which really make sense in the social-networking atmosphere connecting up tags and subject headings and everything.

The library world is going to be continuing to watch LibraryThing for interesting experiments in bibliography. It’s a perfect test-bed for these kinds of things, because its user-base is a dedicated one that cares about adding tags, etc., because they are their books. I don’t know how easily any of this translates to the catalog for a particular library, because the average dedication level for users is lower, but there are potentially lots more users as well. As we see more experiments like this, time will tell.

5 May 2006

More thinking about book queues

I’m continuing to think about this and what might work for “book queues”. A few tidbits:

  • Apparently some library systems allow “active” and “inactive” holds, which might kinda help the problem (i.e., place an “inactive” hold on things you don’t want to read yet).
  • An argument for the processing happening inside the library: people on long holds lists. See the comments on this post at Seattlist. This isn’t generally my situation, but I see the problem.

3 May 2006

Book queue

I have Netflix, and one of the things I really love about it is the queue. Every time someone tells me about an interesting movie, I just add it to the queue. Eventually it comes up to the top, or sometimes I’ll bump something up that I’m really interested in seeing.

I really wish library holds worked this way, but they don’t. I can’t just place a hold on an interesting book that someone mentioned, because I don’t want it right now, I want it after I’m finished with the current book or two I’m reading.

And I’m not exactly sure the library is the right place for the queue to live, anyway, at least for me. I buy a lot of books (something of a book junkie), plus I often read things I already have or that have been lent to me by a friend. I want to keep track of all of those books in my queue, but I don’t need or want to get them all from the library. This kind of queue integrated with something like LibraryThing would be perfect.

Well, my Netflix queue has about 200 DVDs on it (many are multidisc TV series, so it’s not really quite as bad as it sounds). I’m convinced there’s probably about that much in my mental queue of books as well, but there’s no way I can remember that many. The pain has gotten to the point that I decided to stop whining and do something about it.

I’ve built a prototype system in Ruby on Rails, because it rocks my world for getting web applications up and running in a hurry. (Rails also gives you all kinds of AJAX-y goodness without much work.) It’s heavily modeled after the queue in Netflix, with basically four sections: the queue, the current reading, saved books (not in the queue because they’re not released yet), and a history (which I haven’t implemented yet). It pulls information from Amazon’s awesome web services using a Ruby library. Right now that’s pretty rough, but it’s clear how it would work. (Even better would be an API for LibraryThing or something like that.) I also envision that eventually it would incorporate ideas from Jon Udell’s Library Lookup to check for availability in your favorite libraries (and of course it could give you a current price on Amazon or other booksellers as well). Unfortunately there are no automated ways (as far as I know) to place holds in OPACs, so you’d still have to do that manually, but at least it could point you directly to the hold page for the work.

You can see the live demo. It’s still quite rough around the edges, but I think it gives a fairly good idea of what I’m aiming at. It’s also totally unsecured, so I’ll trust you not to go crazy with it and tie up my machine. (If you click the link and don’t get anything, you can probably assume something bad happened. If it does, I’ll bring it back up later with a password and you can email me to check it out.)

Thoughts on Library Camp (finally)

I had the pleasure of attending Library Camp a couple of weeks ago at Ann Arbor District Library. I thought it was a great event. As a student, it gave me a great, low-cost way to go to a conference (or “unconference”) and talk to other librarians—as well as patrons. I think it was great to get really interested non-librarians (like [Superpatron Ed Vielmetti], the instigator of the whole event) in a room with a bunch of librarians and just hash over issues.

You can get good summaries of what went on elsewhere (the L2 wiki, Ryan Eby, John Blyberg among other places), and my memory is fuzzy at this point anyway. But I do have a few thoughts to share:

  • I thought the unconference format worked well. Basically, there’s no preset schedule or speakers. Everyone shows up, talks about what they’re interested in, and a schedule gets created. I think it’s a great, low-cost way to get people together and talk about something, and it can easily be replicated.
  • The Ann Arbor District Library is wonderful! Full of people, especially kids, and lots of interesting stuff going on. The day of Library Camp there was also a video game tournament going on.
  • One of the issues I started to get really jazzed about was the idea of a Netflix-style queue model for borrowing. General consensus was that the current holds models in ILSes and the business processes of libraries don’t support it very well. I’ve been cooking up some ideas in this area (and another little Ruby on Rails project). More on this coming soon in a post.
  • Most of all, it was great to meet people and just talk about libraries. Check out the photos on Flickr.

25 March 2006

Wikipedia vs. Britannica Smackdown 2006: Round Two

If you were on a different planet at the time, you might not have seen the December article in Nature “Internet encyclopedias go head to head”, in which Nature has experts analyze some articles on science in the two encyclopediae and declares them about equal in quality.

Last week, Britannica came out with a strenuous rebuttal (pdf). Britannica makes some excellent points on the weakness of Nature’s methodologies, but most of these were pretty clear to anybody who read between the lines of the original article. More damningly, under close scrutiny it looks like some of the work was downright sloppy, and Nature is refusing to release all the data. I’m disappointed in you, Nature. What would your mother say?

What’s most interesting, however, is a recurring motif that emerges in Britannica’s detailed objections. Britannica defends omissions for a mixture of reasons—because the information was outside the scope of a general encyclopedia, to retain clarity and focus within an article, and possibly, on occasion, even because of space constraints. Britannica also asserts that yearbook articles have no place in the comparison of the encyclopediae; they’re entirely separate in Britannica’s eyes.

Britannica, not without reason, places a great deal of value on the editorial decisions of scope and their effect on clarity and focus. It repeatedly defends the omission of information on this basis. I think it should be pretty clear, however, that the information doesn’t have to be left on the cutting room floor to achieve clarity—it simply has to be represented in the right way and the right place. It is certainly within the realm of possibility to retain detail without detracting from clarity. Britannica is thinking like a paper encyclopedia, not like a hypertext encyclopedia.

Wikipedia runs free of scope restrictions and space restrictions. It can expound all it wants on the plant family Meliaceae. Where Britannica says, “We are not a botanical encyclopedia and do not pretend to be”, Wikipedia wants to be a general encyclopedia and every subject encyclopedia, and the yearbooks. Can it be successful at that? I don’t know.

Britannica has to cope with being an encyclopedia with a paid editorial staff, so managing scope creep in subject matter and amount of detail is a much bigger issue than for Wikipedia. Wikipedia has to cope with being an encyclopedia with a volunteer editorial staff, so controlling clarity and consistency of quality is the more difficult part. This comparison started off being all about the factual information in the two encyclopediae, but facts turn out to be the easy part—writing and editing is where the differences really emerge. Who will “win”? Well, I think there’s plenty of room and plenty of time to work things out.

31 December 2005

Digital library chugging along on Rails

I took a digital libraries course this past semester, in which we learned all sorts of things about usability, accessibility, interoperability, and all the other things that digital libraries and other web applications ought to have.

At the end of the semester, we were charged with the task of actually creating a digital library in groups. Since no programming knowledge was required for the course, we were expected to use software such as Greenstone or DSpace, both open-source packages designed specifically for creating digital libraries.

Most groups chose to use Greenstone, because it’s easy to install and use. The flipside of “easy to install and use” is that the software is largely a black box, difficult to customize to support the desirable features in a digital library we’d spent the whole semester learning about. (And as it’s written mostly in C++, so you need pretty good programming chops to hack away at it.) One group chose DSpace, which has the advantage of being written in easier-to-penetrate Java, but it’s more difficult to install and set up than Greenstone.

Our Project

One of my group members was working at the library at Point Park University, which has a theatre conservatory, and we were interested in pulling together materials on plays and productions, from scripts to playbills to reviews. In thinking about the design, we took cues from other databases/digital libraries such as IMDb and the theatre databases available from Alexander Street Press.

As we sketched out the design of our digital library, we saw the potential for rich interconnections among the data (authors, plays, productions, theaters, directors, actors, etc.), and we saw that Greenstone and DSpace wouldn’t serve us well without hacking them into unrecognizable forms, which none of us has the appropriate skills for. (I’ve read Dorothea’s accounts of taming DSpace, and she’s using it for its intended purpose. I had no desire to get entangled in attempting to half-rewrite it in the course of a several-week project, and learn Java at the same time.)

Don’t get me wrong, Greenstone and DSpace are excellent pieces of software, and they certainly have their uses. But both are tied to a “bibliographic record + item” paradigm, in which there is some metadata (title, author, etc.) that describes a digital document. (DSpace’s primary purpose is actually for institutional repositories.) Our data just didn’t fit this paradigm. So what to do?

Well, the short answer is, we need a database-driven application. The long answer follows.

Selecting Software

There’s PhiloLogic from the University of Chicago, the software on which the Alexander Street Press databases mentioned above are built on (as are a number of other databases, such as ARTFL, which PhiloLogic was originally written for). This is great stuff, but it relies on texts marked up in TEI, an XML scheme for literature and other purposes. In our project, we’re using public domain texts, some from places like Project Gutenberg and other we’re scanning from books. Marking all these up in TEI would have been awfully labor-intensive for this project.

So my group put all their faith in me as I turned to the so-called “full-stack web development frameworks”: Ruby on Rails, TurboGears, and others. I first heard about these by reading about TurboGears on dchud’s work log, and was later blown away by the incredible Ruby on Rails video.

The idea behind these frameworks is to make it easy for lots of people to create web applications. A bunch of really smart programmers got together, cooked up a framework that handles the whole thing from end to end—the database, the business logic, the display views, and all—and package it up so it’s much easier to use than trying to string all those together by yourself.

Constructing the Application

I settled on Ruby on Rails (”Rails” for short), because Ruby, the programming language it uses, seemed similar to PHP, which I’ve had at least a little experience with from tweaking the templates to this blog.

Now, before I go into the nuts and bolts, let me just say, IANAP (I am not a programmer). I got a computer from Radio Shack when I was 8 and learned all about BASIC; I have an abstract understanding of logical structures from being a math major; and I am good with HTML, XML, and CSS. That’s it. I’ve never taken a programming class, written even an absurdly simple application on my own, nada. I am, however, a big subscriber to the “beat on it with a rock until it works” philosophy of computer programming (described wonderfully by Dorothea on Caveat Lector). And this is where the Rails framework is great: it has a feature called “scaffolding” that automatically sets up the basic structure of the application, including all the simple functions like viewing, adding, editing, and deleting records. No need to create something from scratch: have Rails create the scaffold, then beat with a rock until it’s the way you want it.

The basic steps were as follows:

  1. Create a database.

    I don’t really know much about SQL, and I didn’t really understand relational databases (being a hierarchical, XML kind of guy). Fortunately, it’s really easy to set up MySQL with the binary installers they’re now providing and GUI interfaces for administration (MySQL Administrator) and table creation and data entry (YourSQL). (YourSQL is for Mac, but similar things exist for Windows.)

    To create the database, you make tables for all the kinds of data you have, and name them with plurals (plays, authors, actors, etc.). Rails is smart enough to figure out that this means there are individual records for a play, an author, an actor, etc., and it creates the scaffolding for each kind of record based on the columns in the table (an author has a name, birth and death dates, etc.).

  2. Tell Rails about how the data is related.

    In the scaffolding, there’s a “model” for each of the types of records in the database. This is simply a file in which you tell Rails how the data are related. For example, here’s the model play:

    class Play < ActiveRecord::Base
        has_many :productions
        belongs_to :author
        belongs_to :genre
        has_many :characters
    end
    

    All I had to do was supply the has_many :productions-type lines, and include columns in the tables to contain the id of an associated piece of data. (For example, the production table has a play_id column.)

  3. Enter the data into the database.

  4. Mark up templates.

    The scaffolding creates templates that use HTML, CSS, and some special Ruby markup that tell Rails where to drop in the data. Then you can hack away at these to get them to look and behave the way you’d like. Here’s an example (the Ruby commands are in the <% %> parts):

    <p>< %= @play.description %></p>
    <p><b>By:</b> < %= link_to @play.author.name %></p>
    <p><b>Genre:</b> < %= link_to @play.genre.name %></p>
    

    This is pretty easy if you already know HTML. I caught on right away and found myself doing more and more complicated things pretty quickly, because it’s easy to experiment—just try it out and reload the browser.

    Also, I never really understood object-oriented programming until I saw how Rails treats the data. It uses a system called ActiveRecord (which you can see is being called on in the model above) to make the database look like objects, in the object-oriented programming sense. In the example above, @play is the current play, so @play.author finds the play’s author (because play belongs_to :author), and @play.author.name gets the author’s name from that column in the table. Rails even understands plurals, so @play.characters returns an array with all the characters associated with the play (because play has_many :characters). Cool!

Okay, so I’ve simplified this a good deal. I’m not claiming just anybody could walk in off the street and write a web application using Ruby on Rails. It does take some mucking around on the command-line (although that’s helped by GUI packages for MySQL mentioned above) and it does take a little basic programming. But it does make it way more accessible than previously for non-programmers and amateurs to write web apps.

One more thing: this all needs a web server to make it go. Rails provides a lightweight web server of its own, but for Mac OS X, there’s something even easier. A package called Locomotive gives you a GUI for creating new Rails projects and running the webserver.

Bonuses

Ruby on Rails lets us do lots of things that are really hard with relatively opaque systems like Greenstone and DSpace.

The template system makes it highly extensible—adding support for interoperability standards is easy. Want a Dublin Core record for every item? Just make a template and have Rails fill in the appropriate information. Want to add OAI-PMH or COinS-PMH support or anything else? Just do it in the template.

It’s also easy to consume the web services of others. (Here comes the part that really wowed our classmates.) Part of our data was theater locations, and what better way to represent these than a map? Google Maps offers an API which is pretty easy to implement itself, but from Carol at Rawbrick’s airport map I found Google Maps EZ, which made my work even quicker.

The Demo

Okay, here are the goods: http://plays.dystmesis.com. Check it out. For now, you can only browse, because I never got around to making a search function work before this was due. Be sure to check out the map.

The collection is actually a selection of plays that opened in New York City in 1920. We chose these because there were lots of related public domain materials. This is just a sample collection to demonstrate the power of the architecture.

I’ve turned off write access to the database, but I’ve left the links to edit, add, and delete records exposed so you can check them out. I didn’t have a chance to improve on the scaffold forms for creating and editing records, but you can see that the scaffold already does a lot for you.

Conclusions

Open frameworks like Ruby on Rails and TurboGears are making web applications easier than ever, and they’re only likely to improve with time. As librarians who want to make materials available digitally, we should be aware of them and willing to roll up our sleeves and get our hands dirty. I highly recommend it.

There are lots of tools out there that we can make use of that don’t necessarily require too much programming knowledge. Take a look at Aaron’s Western Springs History Project using WordPress, which is designed to be blog software, but works pretty well as a content management system/database application/digital library too. Check out other content management systems like Drupal, Plone, Mambo, or PostNuke. (Ann Arbor Public Library has done spectacular things with Drupal, albeit with bona fide programmers on staff.) And don’t let me put you off Greenstone or DSpace either, because they’re good pieces of software if they’re the ones you need. (Greenstone developers are also working on a new Java-and-XML based version that promises to be more hackable.) Don’t be afraid to beat on things with rocks!

14 October 2005

This is where my money goes

My alma mater undergraduate institution, a small liberal arts college, has spent the last year or more rebuilding the library. (It’s good, because it really needed it.) I think that what they’re doing is really great—they’ve put a lot of thought into things, starting with putting the reference desk on the right as you enter and the circulation desk on the right as you leave. (Sure, it makes total sense—send users to the ref desk as soon as they walk in wanting information—but how many libraries can you think of that make you trip over the circulation desk when you come in?) There’s going to be a lovely, large reading room with fireplaces, and they’ve also been really smart about putting in video and audio editing studios.

On the virtual side, it looks like they’re putting a good bit of money and effort into a redesign of the OPAC, more and better databases, and technology in general, including a blog!

The library is part of an overarching Information Services department, which includes the IT people, media services, etc. Not only are they in the same department, they’re in the same building, and I think this is paying dividends. (It’s been a single department for some time, but I think they’re really hitting their stride with the idea now.) I can only imagine that having the computer help desk in the same building is good for drawing in students, and it looks like the librarians and the IT folks have been talking over the water cooler—a good thing, in my book.

According to the blog, the books in storage over the hiatus have been moved back into the library, and it’s going to be open for winter quarter in January. (During the construction people and books have been stuck in pigeonholes all over campus.) I’m going to Kalamazoo for a wedding in December, and I’m hoping I can score a sneak peek.