31 December 2005

Digital library chugging along on Rails

I took a digital libraries course this past semester, in which we learned all sorts of things about usability, accessibility, interoperability, and all the other things that digital libraries and other web applications ought to have.

At the end of the semester, we were charged with the task of actually creating a digital library in groups. Since no programming knowledge was required for the course, we were expected to use software such as Greenstone or DSpace, both open-source packages designed specifically for creating digital libraries.

Most groups chose to use Greenstone, because it’s easy to install and use. The flipside of “easy to install and use” is that the software is largely a black box, difficult to customize to support the desirable features in a digital library we’d spent the whole semester learning about. (And as it’s written mostly in C++, so you need pretty good programming chops to hack away at it.) One group chose DSpace, which has the advantage of being written in easier-to-penetrate Java, but it’s more difficult to install and set up than Greenstone.

Our Project

One of my group members was working at the library at Point Park University, which has a theatre conservatory, and we were interested in pulling together materials on plays and productions, from scripts to playbills to reviews. In thinking about the design, we took cues from other databases/digital libraries such as IMDb and the theatre databases available from Alexander Street Press.

As we sketched out the design of our digital library, we saw the potential for rich interconnections among the data (authors, plays, productions, theaters, directors, actors, etc.), and we saw that Greenstone and DSpace wouldn’t serve us well without hacking them into unrecognizable forms, which none of us has the appropriate skills for. (I’ve read Dorothea’s accounts of taming DSpace, and she’s using it for its intended purpose. I had no desire to get entangled in attempting to half-rewrite it in the course of a several-week project, and learn Java at the same time.)

Don’t get me wrong, Greenstone and DSpace are excellent pieces of software, and they certainly have their uses. But both are tied to a “bibliographic record + item” paradigm, in which there is some metadata (title, author, etc.) that describes a digital document. (DSpace’s primary purpose is actually for institutional repositories.) Our data just didn’t fit this paradigm. So what to do?

Well, the short answer is, we need a database-driven application. The long answer follows.

Selecting Software

There’s PhiloLogic from the University of Chicago, the software on which the Alexander Street Press databases mentioned above are built on (as are a number of other databases, such as ARTFL, which PhiloLogic was originally written for). This is great stuff, but it relies on texts marked up in TEI, an XML scheme for literature and other purposes. In our project, we’re using public domain texts, some from places like Project Gutenberg and other we’re scanning from books. Marking all these up in TEI would have been awfully labor-intensive for this project.

So my group put all their faith in me as I turned to the so-called “full-stack web development frameworks”: Ruby on Rails, TurboGears, and others. I first heard about these by reading about TurboGears on dchud’s work log, and was later blown away by the incredible Ruby on Rails video.

The idea behind these frameworks is to make it easy for lots of people to create web applications. A bunch of really smart programmers got together, cooked up a framework that handles the whole thing from end to end—the database, the business logic, the display views, and all—and package it up so it’s much easier to use than trying to string all those together by yourself.

Constructing the Application

I settled on Ruby on Rails (”Rails” for short), because Ruby, the programming language it uses, seemed similar to PHP, which I’ve had at least a little experience with from tweaking the templates to this blog.

Now, before I go into the nuts and bolts, let me just say, IANAP (I am not a programmer). I got a computer from Radio Shack when I was 8 and learned all about BASIC; I have an abstract understanding of logical structures from being a math major; and I am good with HTML, XML, and CSS. That’s it. I’ve never taken a programming class, written even an absurdly simple application on my own, nada. I am, however, a big subscriber to the “beat on it with a rock until it works” philosophy of computer programming (described wonderfully by Dorothea on Caveat Lector). And this is where the Rails framework is great: it has a feature called “scaffolding” that automatically sets up the basic structure of the application, including all the simple functions like viewing, adding, editing, and deleting records. No need to create something from scratch: have Rails create the scaffold, then beat with a rock until it’s the way you want it.

The basic steps were as follows:

  1. Create a database.

    I don’t really know much about SQL, and I didn’t really understand relational databases (being a hierarchical, XML kind of guy). Fortunately, it’s really easy to set up MySQL with the binary installers they’re now providing and GUI interfaces for administration (MySQL Administrator) and table creation and data entry (YourSQL). (YourSQL is for Mac, but similar things exist for Windows.)

    To create the database, you make tables for all the kinds of data you have, and name them with plurals (plays, authors, actors, etc.). Rails is smart enough to figure out that this means there are individual records for a play, an author, an actor, etc., and it creates the scaffolding for each kind of record based on the columns in the table (an author has a name, birth and death dates, etc.).

  2. Tell Rails about how the data is related.

    In the scaffolding, there’s a “model” for each of the types of records in the database. This is simply a file in which you tell Rails how the data are related. For example, here’s the model play:

    class Play < ActiveRecord::Base
        has_many :productions
        belongs_to :author
        belongs_to :genre
        has_many :characters
    end
    

    All I had to do was supply the has_many :productions-type lines, and include columns in the tables to contain the id of an associated piece of data. (For example, the production table has a play_id column.)

  3. Enter the data into the database.

  4. Mark up templates.

    The scaffolding creates templates that use HTML, CSS, and some special Ruby markup that tell Rails where to drop in the data. Then you can hack away at these to get them to look and behave the way you’d like. Here’s an example (the Ruby commands are in the <% %> parts):

    <p>< %= @play.description %></p>
    <p><b>By:</b> < %= link_to @play.author.name %></p>
    <p><b>Genre:</b> < %= link_to @play.genre.name %></p>
    

    This is pretty easy if you already know HTML. I caught on right away and found myself doing more and more complicated things pretty quickly, because it’s easy to experiment—just try it out and reload the browser.

    Also, I never really understood object-oriented programming until I saw how Rails treats the data. It uses a system called ActiveRecord (which you can see is being called on in the model above) to make the database look like objects, in the object-oriented programming sense. In the example above, @play is the current play, so @play.author finds the play’s author (because play belongs_to :author), and @play.author.name gets the author’s name from that column in the table. Rails even understands plurals, so @play.characters returns an array with all the characters associated with the play (because play has_many :characters). Cool!

Okay, so I’ve simplified this a good deal. I’m not claiming just anybody could walk in off the street and write a web application using Ruby on Rails. It does take some mucking around on the command-line (although that’s helped by GUI packages for MySQL mentioned above) and it does take a little basic programming. But it does make it way more accessible than previously for non-programmers and amateurs to write web apps.

One more thing: this all needs a web server to make it go. Rails provides a lightweight web server of its own, but for Mac OS X, there’s something even easier. A package called Locomotive gives you a GUI for creating new Rails projects and running the webserver.

Bonuses

Ruby on Rails lets us do lots of things that are really hard with relatively opaque systems like Greenstone and DSpace.

The template system makes it highly extensible—adding support for interoperability standards is easy. Want a Dublin Core record for every item? Just make a template and have Rails fill in the appropriate information. Want to add OAI-PMH or COinS-PMH support or anything else? Just do it in the template.

It’s also easy to consume the web services of others. (Here comes the part that really wowed our classmates.) Part of our data was theater locations, and what better way to represent these than a map? Google Maps offers an API which is pretty easy to implement itself, but from Carol at Rawbrick’s airport map I found Google Maps EZ, which made my work even quicker.

The Demo

Okay, here are the goods: http://plays.dystmesis.com. Check it out. For now, you can only browse, because I never got around to making a search function work before this was due. Be sure to check out the map.

The collection is actually a selection of plays that opened in New York City in 1920. We chose these because there were lots of related public domain materials. This is just a sample collection to demonstrate the power of the architecture.

I’ve turned off write access to the database, but I’ve left the links to edit, add, and delete records exposed so you can check them out. I didn’t have a chance to improve on the scaffold forms for creating and editing records, but you can see that the scaffold already does a lot for you.

Conclusions

Open frameworks like Ruby on Rails and TurboGears are making web applications easier than ever, and they’re only likely to improve with time. As librarians who want to make materials available digitally, we should be aware of them and willing to roll up our sleeves and get our hands dirty. I highly recommend it.

There are lots of tools out there that we can make use of that don’t necessarily require too much programming knowledge. Take a look at Aaron’s Western Springs History Project using WordPress, which is designed to be blog software, but works pretty well as a content management system/database application/digital library too. Check out other content management systems like Drupal, Plone, Mambo, or PostNuke. (Ann Arbor Public Library has done spectacular things with Drupal, albeit with bona fide programmers on staff.) And don’t let me put you off Greenstone or DSpace either, because they’re good pieces of software if they’re the ones you need. (Greenstone developers are also working on a new Java-and-XML based version that promises to be more hackable.) Don’t be afraid to beat on things with rocks!

1 November 2005

Hatin’ on DRM

I’m a Mac/iPod/iTunes guy, and I hadn’t really had a big problem with DRM personally (whatever I might think about it philosophically). That is, until the other night, in an episode I was reminded of when taking a look at Aaron’s presentation on downloading.

Actually, the vast majority of my music is legally purchased. It’s ripped from my CDs or bought from iTunes (or it’s free). The other night, I had a yen to hear a song I had just purchased on iTunes. I launched iTunes on my boyfriend’s PC (because his computer has speakers), looked in the library of songs shared over our home network, and tried to play it. It asks me to authorize his computer to play my music, so I put in my password. And then it doesn’t play.

Why? I’m using the latest version of iTunes, and he’s not. Sure, I can fix this by updating iTunes on his computer, but I wanted to hear the song right now. Was it really necessary to break this and annoy me?

Mark it down!

I’ve been using Markdown, a simplified syntax for creating (X)HTML, on this blog since it started, and I’ve also adopted it for a number of things that aren’t fussy enough to want a word processor or LaTeX to do. But that’s exactly the problem sometimes: I want print output, and I want something simple like Markdown to get it, rather than fooling around writing LaTeX (just like I don’t want to fool around writing XHTML). Other people want this too, so I’ve lurked around the Markdown discussion list seeing what ideas there are. Most of the talk has involved hacking up Markdown (written in Perl) to make it spit out LaTeX instead of XHTML, but my Perl is about as good as my Spanish (which is to say, not good), and besides, this is a maintenance nightmare.

Then I stumbled on Fletcher Penney’s MultiMarkdown, which adds the following to Markdown:

  • create a full XHTML document (with the headers, etc.) instead of just a snippet
  • add basic metadata used in XHTML and LaTeX (title, author, etc.)
  • footnotes! (which are a proposed addition to the “official” Markdown)
  • a set of XSLT stylesheets for transforming the XHTML into LaTeX

This is better, because my XSLT is like my German, which is at least intelligible. And with the addition of footnotes (and a semi-standardized way to represent them in XHTML), the XHTML should be capable of bearing the semantic burden necessary to get LaTeX out of it with relatively little manual cleanup afterward. I have to work on the stylesheets to get exactly what I want out of them, but once they’re done, they’re pretty stable for my purposes.

This is still a branch from Markdown, but it’s necessary for me until Markdown includes footnotes. I think I will work on refactoring the other additions (full document & metadata) into a preprocessor for Markdown, so the chain is something like:

 myPreprocessor.pl sample.txt | Markdown.pl | SmartyPants.pl |
 xsltproc xhtml2article.xslt > sample.tex

Then I can just replace Markdown.pl with any upgrades or bug fixes without “fixing” it.

Gee, it’s really nice to have had a couple hours to finally work that out last night, because it’s been on my list of things to do since August but it seemed so daunting. As it turns out, not really that bad.

24 March 2005

Being entertained

DRMBlog, in the course of considering subscription music services poses some interesting questions about models for media ownership:

  • Would you rather rent music or buy music?
  • Would you rather rent movies or buy movies?
  • Would you rather rent a book or buy a book?
  • Would you rather rent furniture or buy furniture?
  • Would you rather rent a car or buy a car?
  • Would you rather rent a house or buy a house?

Let’s start with the easy ones: I’d rather buy a house, a car, and furniture. They’re physical things (needed in their specific physical forms) that I use regularly. Moreover, I like the idea of owning them and being able to modify them at will.

But what about media like books, music, and movies?

Read the rest of this entry »

4 March 2005

I’m not a programmer, I just play one on TV

I’m not a programmer. I’ve never really written a piece of code that’s worth much. I do, however, know just enough to be dangerous. I can read other people’s code and alter it. But occasionally, not being a programmer, I run up against a wall and have to give up after a while.

So far in getting this blog going, I’ve been really impressed with PHP and Wordpress, because I haven’t gotten stuck on anything I couldn’t get over.

Knowing HTML well enough, it’s no big thang to add a little PHP. A lot of the thanks here goes to the people who put out understandable, well-thought-out Wordpress templates. Both PHP and Wordpress are well-documented, which goes a long way too. I found the PHP documentation especially lucid.

The take-home lesson here is: Don’t be daunted. If you have the slightest bit of geek in you (and maybe even if you don’t), you can do this. And even if you aren’t the least bit geeky, I bet you know someone who is.