I took a digital libraries course this past semester, in which we learned all sorts of things about usability, accessibility, interoperability, and all the other things that digital libraries and other web applications ought to have.
At the end of the semester, we were charged with the task of actually creating a digital library in groups. Since no programming knowledge was required for the course, we were expected to use software such as Greenstone or DSpace, both open-source packages designed specifically for creating digital libraries.
Most groups chose to use Greenstone, because it’s easy to install and use. The flipside of “easy to install and use” is that the software is largely a black box, difficult to customize to support the desirable features in a digital library we’d spent the whole semester learning about. (And as it’s written mostly in C++, so you need pretty good programming chops to hack away at it.) One group chose DSpace, which has the advantage of being written in easier-to-penetrate Java, but it’s more difficult to install and set up than Greenstone.
One of my group members was working at the library at Point Park University, which has a theatre conservatory, and we were interested in pulling together materials on plays and productions, from scripts to playbills to reviews. In thinking about the design, we took cues from other databases/digital libraries such as IMDb and the theatre databases available from Alexander Street Press.
As we sketched out the design of our digital library, we saw the potential for rich interconnections among the data (authors, plays, productions, theaters, directors, actors, etc.), and we saw that Greenstone and DSpace wouldn’t serve us well without hacking them into unrecognizable forms, which none of us has the appropriate skills for. (I’ve read Dorothea’s accounts of taming DSpace, and she’s using it for its intended purpose. I had no desire to get entangled in attempting to half-rewrite it in the course of a several-week project, and learn Java at the same time.)
Don’t get me wrong, Greenstone and DSpace are excellent pieces of software, and they certainly have their uses. But both are tied to a “bibliographic record + item” paradigm, in which there is some metadata (title, author, etc.) that describes a digital document. (DSpace’s primary purpose is actually for institutional repositories.) Our data just didn’t fit this paradigm. So what to do?
Well, the short answer is, we need a database-driven application. The long answer follows.
There’s PhiloLogic from the University of Chicago, the software on which the Alexander Street Press databases mentioned above are built on (as are a number of other databases, such as ARTFL, which PhiloLogic was originally written for). This is great stuff, but it relies on texts marked up in TEI, an XML scheme for literature and other purposes. In our project, we’re using public domain texts, some from places like Project Gutenberg and other we’re scanning from books. Marking all these up in TEI would have been awfully labor-intensive for this project.
So my group put all their faith in me as I turned to the so-called “full-stack web development frameworks”: Ruby on Rails, TurboGears, and others. I first heard about these by reading about TurboGears on dchud’s work log, and was later blown away by the incredible Ruby on Rails video.
The idea behind these frameworks is to make it easy for lots of people to create web applications. A bunch of really smart programmers got together, cooked up a framework that handles the whole thing from end to end—the database, the business logic, the display views, and all—and package it up so it’s much easier to use than trying to string all those together by yourself.
Constructing the Application
I settled on Ruby on Rails (”Rails” for short), because Ruby, the programming language it uses, seemed similar to PHP, which I’ve had at least a little experience with from tweaking the templates to this blog.
Now, before I go into the nuts and bolts, let me just say, IANAP (I am not a programmer). I got a computer from Radio Shack when I was 8 and learned all about BASIC; I have an abstract understanding of logical structures from being a math major; and I am good with HTML, XML, and CSS. That’s it. I’ve never taken a programming class, written even an absurdly simple application on my own, nada. I am, however, a big subscriber to the “beat on it with a rock until it works” philosophy of computer programming (described wonderfully by Dorothea on Caveat Lector). And this is where the Rails framework is great: it has a feature called “scaffolding” that automatically sets up the basic structure of the application, including all the simple functions like viewing, adding, editing, and deleting records. No need to create something from scratch: have Rails create the scaffold, then beat with a rock until it’s the way you want it.
The basic steps were as follows:
Create a database.
I don’t really know much about SQL, and I didn’t really understand relational databases (being a hierarchical, XML kind of guy). Fortunately, it’s really easy to set up MySQL with the binary installers they’re now providing and GUI interfaces for administration (MySQL Administrator) and table creation and data entry (YourSQL). (YourSQL is for Mac, but similar things exist for Windows.)
To create the database, you make tables for all the kinds of data you have, and name them with plurals (plays, authors, actors, etc.). Rails is smart enough to figure out that this means there are individual records for a play, an author, an actor, etc., and it creates the scaffolding for each kind of record based on the columns in the table (an author has a name, birth and death dates, etc.).
Tell Rails about how the data is related.
In the scaffolding, there’s a “model” for each of the types of records in the database. This is simply a file in which you tell Rails how the data are related. For example, here’s the model
class Play < ActiveRecord::Base has_many :productions belongs_to :author belongs_to :genre has_many :characters end
All I had to do was supply the
has_many :productions-type lines, and include columns in the tables to contain the id of an associated piece of data. (For example, the
productiontable has a
Enter the data into the database.
Mark up templates.
The scaffolding creates templates that use HTML, CSS, and some special Ruby markup that tell Rails where to drop in the data. Then you can hack away at these to get them to look and behave the way you’d like. Here’s an example (the Ruby commands are in the
<p>< %= @play.description %></p> <p><b>By:</b> < %= link_to @play.author.name %></p> <p><b>Genre:</b> < %= link_to @play.genre.name %></p>
This is pretty easy if you already know HTML. I caught on right away and found myself doing more and more complicated things pretty quickly, because it’s easy to experiment—just try it out and reload the browser.
Also, I never really understood object-oriented programming until I saw how Rails treats the data. It uses a system called ActiveRecord (which you can see is being called on in the model above) to make the database look like objects, in the object-oriented programming sense. In the example above,
@playis the current play, so
@play.authorfinds the play’s author (because play
belongs_to :author), and
@play.author.namegets the author’s name from that column in the table. Rails even understands plurals, so
@play.charactersreturns an array with all the characters associated with the play (because play
has_many :characters). Cool!
Okay, so I’ve simplified this a good deal. I’m not claiming just anybody could walk in off the street and write a web application using Ruby on Rails. It does take some mucking around on the command-line (although that’s helped by GUI packages for MySQL mentioned above) and it does take a little basic programming. But it does make it way more accessible than previously for non-programmers and amateurs to write web apps.
One more thing: this all needs a web server to make it go. Rails provides a lightweight web server of its own, but for Mac OS X, there’s something even easier. A package called Locomotive gives you a GUI for creating new Rails projects and running the webserver.
Ruby on Rails lets us do lots of things that are really hard with relatively opaque systems like Greenstone and DSpace.
The template system makes it highly extensible—adding support for interoperability standards is easy. Want a Dublin Core record for every item? Just make a template and have Rails fill in the appropriate information. Want to add OAI-PMH or COinS-PMH support or anything else? Just do it in the template.
It’s also easy to consume the web services of others. (Here comes the part that really wowed our classmates.) Part of our data was theater locations, and what better way to represent these than a map? Google Maps offers an API which is pretty easy to implement itself, but from Carol at Rawbrick’s airport map I found Google Maps EZ, which made my work even quicker.
Okay, here are the goods: http://plays.dystmesis.com. Check it out. For now, you can only browse, because I never got around to making a search function work before this was due. Be sure to check out the map.
The collection is actually a selection of plays that opened in New York City in 1920. We chose these because there were lots of related public domain materials. This is just a sample collection to demonstrate the power of the architecture.
I’ve turned off write access to the database, but I’ve left the links to edit, add, and delete records exposed so you can check them out. I didn’t have a chance to improve on the scaffold forms for creating and editing records, but you can see that the scaffold already does a lot for you.
Open frameworks like Ruby on Rails and TurboGears are making web applications easier than ever, and they’re only likely to improve with time. As librarians who want to make materials available digitally, we should be aware of them and willing to roll up our sleeves and get our hands dirty. I highly recommend it.
There are lots of tools out there that we can make use of that don’t necessarily require too much programming knowledge. Take a look at Aaron’s Western Springs History Project using WordPress, which is designed to be blog software, but works pretty well as a content management system/database application/digital library too. Check out other content management systems like Drupal, Plone, Mambo, or PostNuke. (Ann Arbor Public Library has done spectacular things with Drupal, albeit with bona fide programmers on staff.) And don’t let me put you off Greenstone or DSpace either, because they’re good pieces of software if they’re the ones you need. (Greenstone developers are also working on a new Java-and-XML based version that promises to be more hackable.) Don’t be afraid to beat on things with rocks!