18 July 2007

Open Library architecture

You’ve no doubt already heard about the Open Library demo site from the Internet Archive, brainchild of Brewster Kahle and Aaron Swartz. I think it’s a really exciting project, and I’m sure I’ll have more to say about it soon.

One thing that struck me as interesting is a technical detail. On the “About the technology” page, there’s this tidbit:

We wanted a database that could hold tens of millions of records, that would allow random users to modify its entries and keep a full history of their changes, and that would hold arbitrary semi-structured data as users added it. Each of these problems had been solved on its own, but nobody had yet built a technology that solved all three together.

So we created ThingDB (tdb), a new database framework that gives us this flexibility. ThingDB stores a collection of objects, called “things”. For example, on the Open Library site, each page, book, author, and user is a thing in the database. Each thing then has a series of arbitrary key-value pairs as properties. [...] Each collection of key-value pairs is stored as a version, along with the time it was saved and the person who saved it. This allows us to store full semi-structured data, as well as travel back thru time to retrieve old versions of it.

This sounds really interesting. It also reminds me very much of Maya’s u-forms (pdf), aside from the fact that the identifiers aren’t UUIDs. Although I’m not really database-savvy enough to know much about the underlying infrastructure that makes any of this happen, so my interest is something like an ape staring at a power drill, but still, I thought it worth noting.

8 July 2007

Karen Schneider, hip “old lady”

The biblioblogosphere is fluttering with talk about the fluffy librarian-image piece in the New York Times style section. On one hand, it’s one of those “Librarians: we’re cooler than you think we are” articles, and as those go, it’s not a half bad one. I mean, Jessamyn gets mentioned, so that’s one thing going for it right there.

But Karen Schneider calls out what’s lacking. It is, after all, the style section, and there’s a lot of concentration on cocktails, clothes, and tattoos. There’s also a glossing-over of some stereotyping that deserves examining and lack of attention to the things that truly make librarians “hip”. Karen writes,

Jessamyn is of the hippest of the hip not because she routinely uses instant messaging, but because she is such a tireless advocate for small libraries and poor communities — the unserved, often voiceless communities many of us (including me) forget about when we get hopped up about some new new thing.

Right on. And she goes on to say,

I am cool in my subversive old-lady tech-loving the-user-is-not-broken way, and getting cooler all the time, and I count among my friends and colleagues librarians of all ages, dress codes, and evening habits.

Karen, if you want to identify as “old lady”, I’ll support you on whatever you want to be. ;) But I have to say, also one of the coolest librarians I know of. Thanks for blogging.

10 April 2007

Too busy to blog?

My last post was an email I’d just written to a friend. As I was writing the email, I thought, “I should blog this”, largely because I’d read this just an hour before:

When people tell me they’re too busy to blog, I ask them to count up their output of keystrokes. How many of those keystrokes flow into email messages? Most. How many people receive those email messages? Few. How many people could usefully benefit from those messages, now or later? More than a few, maybe a lot more. (Jon Udell, “Too busy to blog? Count your keystrokes.”)

The whole post is definitely worth a read.

I’ve been pretty absent from blogging lately, but I think I’ll take this to heart, both for this blog and for work-related emails that might be better served by an internally-facing blog. My company is reasonably compact (certainly in comparison to Microsoft, anyway!) but we still have a lot of challenges spreading information around when people aren’t all in the office together and communicating face-to-face. We’ve reached a size now where, even when we are all in the office, some people are still out of the loop on X, Y, or Z because they weren’t present at the conversation in so-and-so’s office, or over the lunchroom table, or wherever.

And speaking of not blogging in a while, it reminds me that I also have a post on Yahoo Pipes and Dapp and many related issues that I started ages ago — like when Yahoo Pipes debuted — that I still haven’t finished… Too busy… ;)

Fuel Economy: Then, Now, and in the Future

A friend just emailed me this link. MSN Autos has an article showing that the highest fuel efficiency models in 1992 were more efficient than 2007’s most efficient (non-hybrid) models.

I wrote the following in response:

On its face, this looks really awful, and I do agree that cars could be a lot more fuel efficient if cars companies wanted to make them that way and if people wanted to buy them that way.

But realistically, those high-efficiency cars from 1992? In general, they’re lawn mowers. They struggle going up hills. It’s no wonder no one wants to drive one. The fuel efficient cars from 2007? They’re real cars. The Yaris is totally suitable, and the Mini is downright sporty and fun.

Now, I’ll be the first one to agree with someone who says that gas is underpriced in this country, who says that we have an overwhelming bigger-is-better mentality that’s often not a good thing, that we are over-consumers and not concerned enough about conservation.

But, I don’t think that means everyone has to drive a small car. If you need a station wagon or an SUV or a pickup truck, that’s OK. And it certainly doesn’t mean that you have to drive a car that can’t make it up a hill.

What I really want is for the cost-to-benefit comparison to be more transparent, so that it becomes apparent to someone who’s driving a gigantic SUV what the tradeoffs between fuel efficiency and utility really are. Unfortunately, our current energy prices don’t take a lot of things into the accounting.

26 January 2007

Fields are from Mars and Tags are from Venus: oh really?

When thinking about bibliographic data (for example) and social applications using taggings, it’s pretty easy to think that the data (title, author, and so on) is highly structured and therefore very different from tags, which are freeform and all that jazz. In many ways, that’s true, and it’s especially important for the purposes of bibliographic control. But in social applications where users are contributing data, the line can get a lot fuzzier. LibraryThing is an example: users contribute various structured and unstructured data about books. Some of the data comes from libraries or Amazon, some is put in by hand, and some of the library- or publisher-supplied data is cleaned up by users, because it’s not always right. Users can enter structured information in fields—information about the item in general like title and author, but also personal information, like ratings and the date it was read. They can also enter tags and search and sort books by those tags.

Flickr has just introduced “machine tags” (or “triple tags”). These build on existing geotags, which encode locations like this: geo:long=123.456. They’re three-part tags, with a namespace and a key-value pair, and you could use them to express all manner of things—like, for example dc:title=Othello. (There are also some semi-official uses of namespaces on tags in del.icio.us, like system:unfiled and filetype:mp3, and various users have used namespaces and triple tags on services like these without official support.) You might think of them as a kind of really lightweight RDF.

Triple tags really blow away the distinction between structured fields and freeform tags. This is important, because it’s a step along a road in which it’s easier for Joe and Jane User to make sense of complicated sets of data by sorting and filtering. Once you’ve become comfortable searching and sorting your tags, it’s not too much of a stretch to apply the same tools to more structured data. Sure, maybe it’s the same data that’s always been there, but now maybe Jane User could be better at manipulating it because she doesn’t have to understand “databases”, she just has grok “tags”, along with a little lightweight syntax. It’s just a different way of looking at the data, one that might prove more friendly. I know not all the tools are there yet, and I’m certainly not saying that everybody’s grandma is going to be putting machine tags on Flickr tomorrow, but I think this is a step in the right direction.

11 January 2007

On clever solutions…

When people come to the library, that’s a good thing. But, sometimes lots of people at the library can mean the library gets noisy with people working together or just chatting. People who’ve come to the library for some peace and quiet to get work done can be disturbed.

The solutions to this problem are usually to have quiet study rooms that can be closed off, and/or to formally designate or subtly design for group spaces where it’s OK to talk a little bit separate from quiet spaces. Today, I saw a pretty clever additional idea from my undergraduate alma mater: noise-canceling headphones you can check out to use while you’re in the library. Cool!

3 January 2007

What is venture capital, and is library automation getting any?

I’ve seen the phrase “venture capital” bandied about in reference to Vista Equity Partners’ recent acquisition of SirsiDynix (pdf), and the earlier acquisition of Ex Libris/Endeavor by Francisco Partners. Venture capital is a somewhat nebulous term, meaning different things to different people. Since the rise and fall of the dot-com era, however, it’s most often applied to capital offered to start-ups, anticipating large returns for the relatively high risk of investment. It provides an infusion of cash to a new or small company, enabling innovation. Sometimes “venture capital” is also applied to an investment in a beleaguered company in order to turn it around, which can be similarly high risk/high reward.

Though I can appreciate the hopes of library automation customers that the recent acquisitions may signal an infusion of cash that will fuel innovation, that’s not exactly what’s going on here. These are buyouts by private equity firms of large, established companies. Although we sometimes talk about the state of library automation software in terms that might be described as “beleaguered”, I’m not really sure that describes these companies’ financial situations.

It may indeed be the case that Vista Equity Partners and Francisco Partners intend to invest resources into these companies to make them better and more profitable, and if so, I think that’s great. (I, for one, welcome our new private equity firm overlords.) On the other hand, these acquisitions could be an example of what’s known as leveraged buyout, a strategy by which private equity firms acquire companies by borrowing against the assets they acquire. Often this involves paying themselves a big cash dividend, and then doing just enough to keep the company afloat under the sometimes excessive debt burdens they have inflicted during the acquisition, and attempting to sell it off again in a year or two.

I’m not saying I know which will happen, or even which is more likely. I didn’t do much research about Vista and Francisco’s previous acquisitions and what’s happened to them. I just wanted to make the point that we ought not look at acquisition of large companies like these the same way we look at VCs financing a startup. In these sorts of deals, there’s often a lot of fancy accounting going on that obscures the motives.

29 December 2006

The stupid 5 things

Jessamyn tagged me with the (seemingly totally unavoidable) “5 things you don’t know about me” meme. I really hate bloggy, email-y, chain-letter-y things… but who can say no to Jessamyn? So, with no further delay:

  1. I am familiar with more than half the seasons of the Real World, though I’m not watching The Real World: Denver (this season) because I don’t have MTV any more. I have also had other guilty addictions to reality TV, although currently there’s only Wife Swap. (Watch it. It’s about developing understanding across classes and cultures. Or something. Really.)
  2. Someday, I want to design and build my own house. Other things I would like to build: a boat. A big wooden one, with sails.
  3. I have never broken a bone. (None of mine, and no one else’s either. Except turkey wishbones at Thanksgiving.)
  4. I sometimes burst into song when I’m home alone. Loudly, and with feeling. This time of year, mostly Bing Crosby tunes. The cat thinks I’m a little nuts, but I don’t mind.
  5. If I had been a girl, my parents were going to name me Amanda.

I hate to pass this on, but I’m sure if I don’t a tree will fall on my car or I will be forced to surrender my firstborn child to a mysterious dwarf. Or something. So, in honor of Time magazine’s person of the year, I tag… YOU. Enjoy.

15 December 2006

Horn-tooting

I wrote an article in the current Library Journal on the development of the open-source Evergreen ILS. It makes an interesting case study for the development of a large and complicated piece of software from within a library consortium, and the resulting ILS and OPAC is pretty exciting! (Disclaimer: I was an intern on the project while I was in library school, so I’m biased.)

You can also find the article in the print issue.

14 December 2006

Google Patent Search

Google is beta-ing a patent search. Cool.

They’ve used the same technologies as Google Book Search on the historical database of patents, so you get full-text searching all the way back to the first US patents in 1790. The USPTO database has offered the images for some time, but only has full-text searching to 1976.