Tuesday, 21 December 2010

Keywords (aka "tags") ARE NOT Structure

And I understand that's the whole point. But sometimes, you're dealing with a concept or with data that is most naturally and properly organised in a structured hierarchy. While keywords1 are a very convenient tool for filtering ad hoc queries2, they're a poor stand-in for truly structured information.

One (possibly trivial) case in point: managing your browser bookmarks. What I want to be able to do is to organise bookmarks3 and be able to view them in the hierarchy and order that they will be in in the browser "Bookmarks" menu. That way, regardless of which file format4 is being imported from or exported to, while you're using the bookmark manager, you have an accurate view of how the ("folder"-based) organisation of the bookmarks will appear in your browser.

So why not, as several managers do, use keywords to imply structure? Let's take a look at that idea, with a scenario. Let's say you have 4,000-5,000 bookmarks in a fairly deep and wide hierarchy of about 800 folders, some nested five and six levels from the root. While that may sound like a 1996-era Yahoo!, that's roughly what I deal with, and I know quite a few long-time Web users who have significantly more.

OK, so you're going to do your internal organisation using keywords. You're a Web developer, so you might have a few bookmarks in a folder with the path Development/Web Development/PHP/Frameworks/Agavi. Each bookmark's reference in the bookmark-manager software would have the keywords Development, Web Development, PHP, and so on. It clutters up the keyword list, and it necessarily complicates the display and editing of those bookmarks within the software. It also means that more data is stored for each and every bookmark. Disk drives are cheaper than they have ever before been, but they're not free.

And, what's worse, if you accidentally (or otherwise) delete one of the keywords from one of the bookmarks, when you export your list again, that bookmark will be in a brand-new folder somewhere else in the hierarchy. Finding and fixing that, if you have a reasonably complex bookmark list, can be difficult. If you don't find and fix it, of course, then it won't be there when you go looking in what should be the right place.

And finally, you're twisting the whole structure of the way people (and browsers) view bookmarks; you're distorting the semantics of the system. This may seem a fine point in today's "get-it-done-yesterday!" way of doing everything. It's not. By not having a clean, self-consistent, efficient conceptual organisation for your software, you make it more difficult to learn, more difficult to use, more complex to develop, more expensive to maintain, and more likely to have more severe problems over the lifetime of the software5.

One final mental exercise. Think of that CD collection I alluded to at the beginning of this post, that's now stored on your computer using whatever music player you like. If I delete the Wakeup Music keyword from the track info for "Come Fly With Me," and then I play the album, nothing has changed. That's metadata. But if I change the bit-rate of the recording, or move the track into the folder for the album "My Way," that's different, isn't it? That's a change to the data, or to the organisation of the data. One is a matter of choice; the other is a matter of history.

By conflating data and metadata (literally, "data about data,") we do a disservice to both.

And now, if either of you happen to know of a (preferably not browser-based) bookmark manager that does things The Right Way™ — or reasonably close to it — I'm all ears.

Thanks for reading, and replying.


1. Though they're popularly called "tags" by all the Kool Kids™ these days, and "Labels" here on Blogger, I'm going to refer to the data items as "keywords" because that's what the library- and information-sciences worlds call them, as well as any programmer who started his Craft before, oh, about 1995. (Return)

2. "List all my Frank Sinatra album CD titles that I've tagged as Wakeup Music." (Return)

3. Each "bookmark" should store URL and page title at an absolute minimum; keywords are good to have; additional metadata optional. (Return)

4. XBEL, OPML, LSMFTML, or whatever the new "standard" is next month. (This would be a good feature to think about some sort of plugin architecture for.) (Return)

5. Look at practically any Windows program for examples of this, or, better, any three Windows programs together. It's not that they don't work, but after a few hours of use you can think of several ways that each could be more consistent with itself and with others. (Return)