Digital library chugging along on Rails

I took a digital libraries course this past semester, in which we learned all sorts of things about usability, accessibility, interoperability, and all the other things that digital libraries and other web applications ought to have.

At the end of the semester, we were charged with the task of actually creating a digital library in groups. Since no programming knowledge was required for the course, we were expected to use software such as Greenstone or DSpace, both open-source packages designed specifically for creating digital libraries.

Most groups chose to use Greenstone, because it’s easy to install and use. The flipside of “easy to install and use” is that the software is largely a black box, difficult to customize to support the desirable features in a digital library we’d spent the whole semester learning about. (And as it’s written mostly in C++, so you need pretty good programming chops to hack away at it.) One group chose DSpace, which has the advantage of being written in easier-to-penetrate Java, but it’s more difficult to install and set up than Greenstone.

Our Project

One of my group members was working at the library at Point Park University, which has a theatre conservatory, and we were interested in pulling together materials on plays and productions, from scripts to playbills to reviews. In thinking about the design, we took cues from other databases/digital libraries such as IMDb and the theatre databases available from Alexander Street Press.

As we sketched out the design of our digital library, we saw the potential for rich interconnections among the data (authors, plays, productions, theaters, directors, actors, etc.), and we saw that Greenstone and DSpace wouldn’t serve us well without hacking them into unrecognizable forms, which none of us has the appropriate skills for. (I’ve read Dorothea’s accounts of taming DSpace, and she’s using it for its intended purpose. I had no desire to get entangled in attempting to half-rewrite it in the course of a several-week project, and learn Java at the same time.)

Don’t get me wrong, Greenstone and DSpace are excellent pieces of software, and they certainly have their uses. But both are tied to a “bibliographic record + item” paradigm, in which there is some metadata (title, author, etc.) that describes a digital document. (DSpace’s primary purpose is actually for institutional repositories.) Our data just didn’t fit this paradigm. So what to do?

Well, the short answer is, we need a database-driven application. The long answer follows.

Selecting Software

There’s PhiloLogic from the University of Chicago, the software on which the Alexander Street Press databases mentioned above are built on (as are a number of other databases, such as ARTFL, which PhiloLogic was originally written for). This is great stuff, but it relies on texts marked up in TEI, an XML scheme for literature and other purposes. In our project, we’re using public domain texts, some from places like Project Gutenberg and other we’re scanning from books. Marking all these up in TEI would have been awfully labor-intensive for this project.

So my group put all their faith in me as I turned to the so-called “full-stack web development frameworks”: Ruby on Rails, TurboGears, and others. I first heard about these by reading about TurboGears on dchud’s work log, and was later blown away by the incredible Ruby on Rails video.

The idea behind these frameworks is to make it easy for lots of people to create web applications. A bunch of really smart programmers got together, cooked up a framework that handles the whole thing from end to end—the database, the business logic, the display views, and all—and package it up so it’s much easier to use than trying to string all those together by yourself.

Constructing the Application

I settled on Ruby on Rails (”Rails” for short), because Ruby, the programming language it uses, seemed similar to PHP, which I’ve had at least a little experience with from tweaking the templates to this blog.

Now, before I go into the nuts and bolts, let me just say, IANAP (I am not a programmer). I got a computer from Radio Shack when I was 8 and learned all about BASIC; I have an abstract understanding of logical structures from being a math major; and I am good with HTML, XML, and CSS. That’s it. I’ve never taken a programming class, written even an absurdly simple application on my own, nada. I am, however, a big subscriber to the “beat on it with a rock until it works” philosophy of computer programming (described wonderfully by Dorothea on Caveat Lector). And this is where the Rails framework is great: it has a feature called “scaffolding” that automatically sets up the basic structure of the application, including all the simple functions like viewing, adding, editing, and deleting records. No need to create something from scratch: have Rails create the scaffold, then beat with a rock until it’s the way you want it.

The basic steps were as follows:

  1. Create a database.

    I don’t really know much about SQL, and I didn’t really understand relational databases (being a hierarchical, XML kind of guy). Fortunately, it’s really easy to set up MySQL with the binary installers they’re now providing and GUI interfaces for administration (MySQL Administrator) and table creation and data entry (YourSQL). (YourSQL is for Mac, but similar things exist for Windows.)

    To create the database, you make tables for all the kinds of data you have, and name them with plurals (plays, authors, actors, etc.). Rails is smart enough to figure out that this means there are individual records for a play, an author, an actor, etc., and it creates the scaffolding for each kind of record based on the columns in the table (an author has a name, birth and death dates, etc.).

  2. Tell Rails about how the data is related.

    In the scaffolding, there’s a “model” for each of the types of records in the database. This is simply a file in which you tell Rails how the data are related. For example, here’s the model play:

    class Play < ActiveRecord::Base
        has_many :productions
        belongs_to :author
        belongs_to :genre
        has_many :characters
    end
    

    All I had to do was supply the has_many :productions-type lines, and include columns in the tables to contain the id of an associated piece of data. (For example, the production table has a play_id column.)

  3. Enter the data into the database.

  4. Mark up templates.

    The scaffolding creates templates that use HTML, CSS, and some special Ruby markup that tell Rails where to drop in the data. Then you can hack away at these to get them to look and behave the way you’d like. Here’s an example (the Ruby commands are in the <% %> parts):

    <p>< %= @play.description %></p>
    <p><b>By:</b> < %= link_to @play.author.name %></p>
    <p><b>Genre:</b> < %= link_to @play.genre.name %></p>
    

    This is pretty easy if you already know HTML. I caught on right away and found myself doing more and more complicated things pretty quickly, because it’s easy to experiment—just try it out and reload the browser.

    Also, I never really understood object-oriented programming until I saw how Rails treats the data. It uses a system called ActiveRecord (which you can see is being called on in the model above) to make the database look like objects, in the object-oriented programming sense. In the example above, @play is the current play, so @play.author finds the play’s author (because play belongs_to :author), and @play.author.name gets the author’s name from that column in the table. Rails even understands plurals, so @play.characters returns an array with all the characters associated with the play (because play has_many :characters). Cool!

Okay, so I’ve simplified this a good deal. I’m not claiming just anybody could walk in off the street and write a web application using Ruby on Rails. It does take some mucking around on the command-line (although that’s helped by GUI packages for MySQL mentioned above) and it does take a little basic programming. But it does make it way more accessible than previously for non-programmers and amateurs to write web apps.

One more thing: this all needs a web server to make it go. Rails provides a lightweight web server of its own, but for Mac OS X, there’s something even easier. A package called Locomotive gives you a GUI for creating new Rails projects and running the webserver.

Bonuses

Ruby on Rails lets us do lots of things that are really hard with relatively opaque systems like Greenstone and DSpace.

The template system makes it highly extensible—adding support for interoperability standards is easy. Want a Dublin Core record for every item? Just make a template and have Rails fill in the appropriate information. Want to add OAI-PMH or COinS-PMH support or anything else? Just do it in the template.

It’s also easy to consume the web services of others. (Here comes the part that really wowed our classmates.) Part of our data was theater locations, and what better way to represent these than a map? Google Maps offers an API which is pretty easy to implement itself, but from Carol at Rawbrick’s airport map I found Google Maps EZ, which made my work even quicker.

The Demo

Okay, here are the goods: http://plays.dystmesis.com. Check it out. For now, you can only browse, because I never got around to making a search function work before this was due. Be sure to check out the map.

The collection is actually a selection of plays that opened in New York City in 1920. We chose these because there were lots of related public domain materials. This is just a sample collection to demonstrate the power of the architecture.

I’ve turned off write access to the database, but I’ve left the links to edit, add, and delete records exposed so you can check them out. I didn’t have a chance to improve on the scaffold forms for creating and editing records, but you can see that the scaffold already does a lot for you.

Conclusions

Open frameworks like Ruby on Rails and TurboGears are making web applications easier than ever, and they’re only likely to improve with time. As librarians who want to make materials available digitally, we should be aware of them and willing to roll up our sleeves and get our hands dirty. I highly recommend it.

There are lots of tools out there that we can make use of that don’t necessarily require too much programming knowledge. Take a look at Aaron’s Western Springs History Project using WordPress, which is designed to be blog software, but works pretty well as a content management system/database application/digital library too. Check out other content management systems like Drupal, Plone, Mambo, or PostNuke. (Ann Arbor Public Library has done spectacular things with Drupal, albeit with bona fide programmers on staff.) And don’t let me put you off Greenstone or DSpace either, because they’re good pieces of software if they’re the ones you need. (Greenstone developers are also working on a new Java-and-XML based version that promises to be more hackable.) Don’t be afraid to beat on things with rocks!

7 Responses to “Digital library chugging along on Rails”

  1. Comment by Dorothea Salo

    This is bloody awesome. I am deeply impressed.

    And amused that beating things with rocks is now a meme. But mostly impressed.

  2. Comment by Simon

    Very good post. And very impressive database as well. I agree with you about the limitations of Greenstone. I used it for a similar project, and it was OK - but clearly didn’t have the same capabilities of RoR.

  3. Pingback by dystmesis » Blog Archive » Horn-tooting, for fun and profit

    [...] However, I’m briefly popping my head out of the burrow to engage in a little shameless self-promotion. I’d like to call the attention of the reader to two things I’ve been working on lately. First, you can check out my article in this month’s Library Journal, “Shoestring Digital Library”. It’s based on my experiences building a prototype digital library using Ruby on Rails (detailed in an earlier post) and provides some ideas for building digital libraries using software from outside the usual pool of suspects. [...]

  4. Comment by washtublibrarian

    Great stuff! Thanks for the play-by-play…if my Digital Libraries class had happened across your post a few months ago, we might have picked up the ROR idea, too! Instead, we made the architecture in ASP.NET/VB.NET, which I have mixed feelings about. Wish I had developed the skills programming in a non-proprietary system (say, PHP or Ruby) but using ASP.NET really made the fast pace feasible to realize the goal…programming a digital library framework from the ground up in two weeks. Ugh. In any case, our DL is up at http://bfhsmuseum.bfn.org — we just finished it in May.

  5. Comment by Jonathan

    Thanks! I think your digital library looks great, too. I’m not familiar with the dark bowels of ASP.NET/VB.NET when it comes to developing web apps, but it certainly appears to be a viable alternative if you swing that way. ;) As a mostly-Mac user, I didn’t really consider it for our project.

  6. Comment by washtublibrarian

    I hear ya … I’m working on a Mac most of the time, too, but our department is very much about the Microsoft products. Which is fine…ASP was very flexible for what we needed. I had a real desire to make the database more relational (could only implement a flat file with lookup tables for controlled vocab terms), though, which wasn’t really possible in any of the documentation I read — did you use a flat file, lookup tables, or a relational model?

  7. Comment by Jonathan

    It’s a relational model. The Rails magic takes care of the id’s and join tables and so forth, all you really have to do is tell it about the relationships (play belongs_to :author) and create the columns in the db.

    I’m sure ASP is also capable of more sophisticated database arrangements (probably with MS-SQL Server), but it’s probably also extra complicated. I’ve sufferred through an MS-SQL install once, and found it painful.

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>