9 June 2008

LibraryThing API

While I’m getting back on the blogging horse…

I realize this is old news now, but LibraryThing announced an API for work data. This is great. But what’s really awesome? This little tidbit from the post:

Scope. This is an API to work information. Once I’ve worked through the kinks here, I plan to release a member API, allowing members to do clever things with their data. For example, members will be able to make their own widgets, not just rely on ours.

I will squeal with glee the day there is a member API. I’ve been harping on the issue for ages, because I really want a way to make a “to-read” list that mashes up LibraryThing with my local library data. I can’t wait.

24 September 2007

Where I was last week and thoughts on federation

I spent Monday–Wednesday of last week in Las Vegas at the Gartner Summit on Portals, Content, and Collaboration. The highlight for me was a talk by Jakob Nielsen on usability in intranets. I even got to ask him a question pertaining to his eye-tracking research (users don’t look at ads) and about design pitfalls to avoid even on ad-free sites like intranets.

Unfortunately I didn’t get to stay for the Web 2.0 and Open Source Summits, which were taking place Wednesday–Friday. I’m especially bummed because David Weinberger was speaking, and I’m a total DW fanboy. (I squealed with joy when I won his book in the LibraryThing contest giving them away.)

Much of the conference dealt with employing Web 2.0 technologies in the enterprise and dealing with the fact that the work/life line is blurry and users’ expectations for corporate portals, intranets, etc. are set by their dealings with the Web at large (an idea known as “consumerism”). I can certainly relate — I know I expect (sometimes impatiently) that applications at work, from email right on up the chain, work as well as those I’m used to at home.

I found much of the conference confirmatory of trends I already recognize, but there was one thing in particular that got me thinking. Gartner analyst David Gootzit gave a presentation about the future of the portal market. He argued that consumerization will lead to the development of a “portal fabric” for the aggregation of experiences across the portals people use (e.g., iGoogle, your bank portal, your work portal, etc.) — the “Follow Me Portal” or “MyPortal”. The emergence of this portal fabric requires standardization of a number of different functions, such as identity management, personalization and preferences, portlets, and metadata. (Gootzit also argues that this trend is likely to result in enterprise portals being decomposed into component services, something that we’re already beginning to see to some degree with, e.g. search.)

It certainly would be cool if someday My Yahoo! or iGoogle or something else could be your real, honest-to-goodness personal homepage that aggregated all the things you were interested in. Not just your horoscope and the weather and some RSS feeds, but also your bank balance, what’s going on at work, your home automation portal, and so on. (Now, I certainly know there are privacy/trust issues with, e.g. letting Yahoo! or Google access your bank balance, but let’s assume the portal provider is an entity you trust.)

I want to skip over, for the time being, the question of what sort of software the “Follow Me Portal” actually is — whether it’s from a major web provider like Yahoo or Google, or whether it’s built on enterprise portal frameworks within businesses, or by Web 2.0 startups, or even as plugins or customizations to desktop software such as browser extensions or something like Flock. Instead, I want to look at the idea of the “portal fabric” that would be needed to support it. What standards currently exist for federating the functions of portals and where are there gaps? Here’s my still-processing-the-thoughts list…

Identity management

For authentication, we have OpenID. Although it’s not entirely clear yet if OpenID is the winner here, it’s looking better all the time. Big services like AOL and Livejournal are both OpenID providers (and a third party provides OpenIDs for Yahoo! accounts using Yahoo!’s API) — meaning about there are about 120 million OpenIDs out there already, whether they’re being used yet or not. Fewer sites accept OpenID for authentication, but the number appears to be steadily growing — I’m using OpenID to sign into 37signals applications, 43folders just announced they’ll be supporting it, and the other day when I got a trial account to myExperiment.org it asked me to sign up with an OpenID. (You can find more site accepting OpenIDs at myopenid.net.)

For other information about identity, there’s XFN and FOAF. This is especially timely given Six Apart’s David Recordon’s announcement of tools for “opening the social graph”, which is not only about managing your own identity, but also your relationships to others.

There’s also vCard/hCard for directory-listing type info about people. (UPDATE: And duh, I forgot about LDAP.)

Portlets (or widgets, or gadgets, or what have you)

Well, there’s JSR168 and WSRP (and forthcoming updates in JSR286, and WSRPv2) but those are really only adopted by commercial enterprise portal frameworks. Google, Yahoo, etc., aren’t supporting them. Maybe they should, or maybe there’s something else. Certainly RSS and Atom represent really lightweight ways of passing data to a portlet/widget/gadget, but they’re not nearly as broadly encompassing as JSR168 or WSRP are, and can’t fully encapsulate the definition of a portlet to make it portable across these portals. How great would it be if your Yahoo widgets, Google gadgets, Apple Dashboard widgets, etc. were all interoperable and you didn’t have to worry about which platform any particular widget was made for?

I don’t really know a ton about the details of JSR168 or WSRP. I’m not sure whether they represent a viable way(s) forward, or a new, more flexible standard is needed in this category.

Personalization and Preferences

This category is possibly the most tricky to deal with, which is probably why there are few existing standards in this realm. I think there’s also a great deal of value to be gained here, however.

One standard that does come to mind is P3P for privacy preferences. It’s been around for some time but hasn’t really gotten a great deal of traction, although there are some browser plugins and so forth.

Search

Of course, there’s Z39.50, but I don’t know of anyone in their right mind who’s not running a library catalog with a Z39.50 interface. (UPDATE: I forgot to mention SRU and CQL, which are based on Z39.50 but updated for the Web. I think I used to know more about these, but now remember approximately nil. I have to read up again…)

OpenSearch is a more modern, digestible alternative. It hasn’t been around very long, but it’s gaining support both by search engines, wikis, blogs, and other tools (as providers) and browsers (as consumers). And there are extensions to handle more complex searches, geographic searches, and other more complicated things.

Also related: metadata standards. There are lots of these for specialized purposes, but the simplest are the likeliest to be useful for syndication and aggregation purposes. Specifically I’m thinking of Dublin Core here. Certainly DC isn’t complex enough to handle most metadata needs for even mildly complex cases, but what would be interesting is if metadata schemes had a defined reduction algorithm to simple Dublin Core, so that a standard set of metadata could be used by, e.g. OpenSearch. (Search APIs could still support native metadata schemas as well, but I think there’s value in a standard interface for straightforward parameterized searches based on things like dates and authors.)

Publishing

For pushing content elsewhere, there’s the in-development Atom Publishing Protocol (APP), which is mostly thought of as an API for posting to blogs. But wouldn’t it be cool if you could also use it to comment on someone else’s blog, or participate in a discussion forum, or change a wiki article — or even, say, post a link to a bookmarking service?

OK, maybe that last one can stick with specialized APIs. I’m not sure there’s a strong case for, e.g. del.icio.us, to support APP. But for blog posts, comments, discussions, wikis — which are all essentially similar things, just updating some content — it would be killer to use the same interface for them all, right?

Closing thoughts

These thoughts aren’t fully formed at this point, and this is something I’m going to continue thinking about. If you think there are existing standards that I’m overlooking in any of these areas, or if you think there are any areas to be standardized that I missed completely, I’d be interested in hearing about them.

19 July 2007

Yahoo Pipes, Google Mashups, etc.

Is anyone out there using Yahoo Pipes, Google Mashups, or something like Dapper or Coghead on a library website or for library services? If so, I want to talk to you! I’m writing an article. Email me at jonathanweber@mac.com.

18 July 2007

Open Library architecture

You’ve no doubt already heard about the Open Library demo site from the Internet Archive, brainchild of Brewster Kahle and Aaron Swartz. I think it’s a really exciting project, and I’m sure I’ll have more to say about it soon.

One thing that struck me as interesting is a technical detail. On the “About the technology” page, there’s this tidbit:

We wanted a database that could hold tens of millions of records, that would allow random users to modify its entries and keep a full history of their changes, and that would hold arbitrary semi-structured data as users added it. Each of these problems had been solved on its own, but nobody had yet built a technology that solved all three together.

So we created ThingDB (tdb), a new database framework that gives us this flexibility. ThingDB stores a collection of objects, called “things”. For example, on the Open Library site, each page, book, author, and user is a thing in the database. Each thing then has a series of arbitrary key-value pairs as properties. […] Each collection of key-value pairs is stored as a version, along with the time it was saved and the person who saved it. This allows us to store full semi-structured data, as well as travel back thru time to retrieve old versions of it.

This sounds really interesting. It also reminds me very much of Maya’s u-forms (pdf), aside from the fact that the identifiers aren’t UUIDs. Although I’m not really database-savvy enough to know much about the underlying infrastructure that makes any of this happen, so my interest is something like an ape staring at a power drill, but still, I thought it worth noting.

26 January 2007

Fields are from Mars and Tags are from Venus: oh really?

When thinking about bibliographic data (for example) and social applications using taggings, it’s pretty easy to think that the data (title, author, and so on) is highly structured and therefore very different from tags, which are freeform and all that jazz. In many ways, that’s true, and it’s especially important for the purposes of bibliographic control. But in social applications where users are contributing data, the line can get a lot fuzzier. LibraryThing is an example: users contribute various structured and unstructured data about books. Some of the data comes from libraries or Amazon, some is put in by hand, and some of the library- or publisher-supplied data is cleaned up by users, because it’s not always right. Users can enter structured information in fields—information about the item in general like title and author, but also personal information, like ratings and the date it was read. They can also enter tags and search and sort books by those tags.

Flickr has just introduced “machine tags” (or “triple tags”). These build on existing geotags, which encode locations like this: geo:long=123.456. They’re three-part tags, with a namespace and a key-value pair, and you could use them to express all manner of things—like, for example dc:title=Othello. (There are also some semi-official uses of namespaces on tags in del.icio.us, like system:unfiled and filetype:mp3, and various users have used namespaces and triple tags on services like these without official support.) You might think of them as a kind of really lightweight RDF.

Triple tags really blow away the distinction between structured fields and freeform tags. This is important, because it’s a step along a road in which it’s easier for Joe and Jane User to make sense of complicated sets of data by sorting and filtering. Once you’ve become comfortable searching and sorting your tags, it’s not too much of a stretch to apply the same tools to more structured data. Sure, maybe it’s the same data that’s always been there, but now maybe Jane User could be better at manipulating it because she doesn’t have to understand “databases”, she just has grok “tags”, along with a little lightweight syntax. It’s just a different way of looking at the data, one that might prove more friendly. I know not all the tools are there yet, and I’m certainly not saying that everybody’s grandma is going to be putting machine tags on Flickr tomorrow, but I think this is a step in the right direction.

11 January 2007

On clever solutions…

When people come to the library, that’s a good thing. But, sometimes lots of people at the library can mean the library gets noisy with people working together or just chatting. People who’ve come to the library for some peace and quiet to get work done can be disturbed.

The solutions to this problem are usually to have quiet study rooms that can be closed off, and/or to formally designate or subtly design for group spaces where it’s OK to talk a little bit separate from quiet spaces. Today, I saw a pretty clever additional idea from my undergraduate alma mater: noise-canceling headphones you can check out to use while you’re in the library. Cool!

15 December 2006

Horn-tooting

I wrote an article in the current Library Journal on the development of the open-source Evergreen ILS. It makes an interesting case study for the development of a large and complicated piece of software from within a library consortium, and the resulting ILS and OPAC is pretty exciting! (Disclaimer: I was an intern on the project while I was in library school, so I’m biased.)

You can also find the article in the print issue.

17 November 2006

Tags and Subject Headings in LibraryThing

When I finished library school in August, I put all of my papers in some boxes and haven’t looked at them since, because I really needed to recover. Happily, yesterday’s post about folksonomy finally forced me to dredge them out and bring to light a paper I did for my indexing and abstracting class on the use of a folksonomy alongside a controlled vocabulary in LibraryThing.

The first part is a sort of “literature review” on folksonomy (such as it is) and an overview of the concepts involved. The second part takes a look at LibraryThing and compares tags and subject headings.

The full text of the paper follows, or you can download a pdf for printing (warning: it’s in ugly, formatted-to-turn-in-for-class format; one of these days I’ll get it prettied up). Some discussion of the features of LibraryThing are slightly out of date (Tim & co. move fast!), and the statistics about popular tags and subject headings certainly are, but I think the main points are still relevant. I’d like to do some more in-depth analysis, especially of the statistical data, at some point in the future.

Read the rest of this entry »

6 September 2006

Evergreen is live!

Just wanted to give a big hurrah and congratulations to the guys at PINES for launching their new ILS. It’s an open-source ILS and I think it’s rocking the library automation world. You can check out the slick OPAC, Evergreen.

I’m a little biased, of course. I was an intern on the project this summer during library school.

I’ve been slow to post lately—you know, vacation, organizing all my school stuff now that I’m finished, getting back to work, etc. More is coming soon though, including a long post about the final paper I did focusing on subject headings, tags, and LibraryThing.

14 August 2006

Philosophy and predicting the future

I thought I’d share a final essay I wrote for a course on “Organizing Information”, connecting Walter Ong on the eras of information culture with Bruce Sterling on the eras of technoculture, via Suzanne Briet considering objects as documents.

That was a mouthful. It goes something like this: Ong talked about the shifts in culture that occurred in the move from oral (spoken) transmission of information to literate (written) society. Sterling talks about the shifts in culture that occurred due to the way that things (material objects) are produced and consumed, from handmade to mass-produced to “smart”.

Suzanne Briet was a librarian and documentalist in France in the early 20th century. Her work has come into the light in recent years largely thanks to Michael Buckland and an article in JASIST titled “What is a Document?”, in which Buckland explores a variety of perspectives on what constitutes a “document”. Briet (now, rather famously, in library circles at least) asserted that, although an antelope in the wild was not a document, an antelope that was captured, put in a zoo, cataloged, and considered an object of study could be considered a document just as much as text printed on paper.

So, using Briet’s ideas about objects as documents, Ong’s cultures and Sterling’s begin to converge into a conglomerate in which it is (or will be) no longer easy to distinguish between the two. This is especially the case in an “Internet of Things”, in which objects are increasingly retrievable and record information about themselves.

You can get a copy of my essay (pdf) if you’re interested.