Monday, 23 November 2009

Reuse, Renew, Recycle - Data Structures Edition

Everything gets recycled these days, including acronyms...

Whether you're developing Web pages in your language of choice, writing a new game to take on the world, or (insert your project here), at some point, you're going to deal with structured data.

You're quite likely, at some point, to deal with fairly complex, nested structured data - for configuration, to preserve the state of objects within your software, or whatever. In recent years, "complex, nested structured data" has often been coded up in XML. This can save your coding bacon; there are quite a few nice tools out there to work with XML, insulating you from having to deal with the "raw" <tag ...="" attribute="value">data<nested_tag>more data</nested_tag></tag> soup.

But at some point you may well end up having to hand-edit the thing — you may want to pull something out for reuse, or you suspect that one of the tools you're using is messing up in some way... it happens. And even during that bright, wonderful time when you're building a POC (proof of concept/ possibly overly c***tastic; see, I said we could reuse), you wind up humping a lot of "extraneous" data around that just really wraps a few small things. Ugh.

Even with compression, you're still taking a hit; you've just moved it from the pipe to the CPU. Twice.

I got handed a client project a little over a year ago that used this nifty notation called YAML. While YAML was apparently originally developed for Ruby apps, it's evolved; there are now support tools/libraries for everything from PHP to Visual Basic to Haskell. "YAML Ain't Markup Language," but it can express very complex data structures; nesting is defined by indentation, item types by decorations. (See the spec or this PHP implementation for more detail.)

For projects which have been using binary data storage for non-BLOB data, simply because XML was seen as overkill, this might well simplify things. Likewise, if you're pushing data around between different apps/platforms, thinking "I maybe oughta look at JSON", this is for you.

I've been using it primarily in PHP to encode various data and content models, and it has allowed me to dramatically simplify my code in several places - reducing complexity, inefficiency, and possible bugs. The Ruby and Python crowds have been using this for a while; PHP folk have tended to look at it suspiciously, since it came from Over in Those Languages.

Good tools work in any language.

Reuse? I was talking about recycled acronyms earlier?

At about the same time I found YAML the non-markup-language markup language, I also came across a nifty CSS framework with the highly original name of YAML. This code, which makes it pretty easy to style sites using Yet Another Multicolumn Layout, is a set of CSS (2.x) stylesheets, with minimal graphics and JavaScript thrown into the box. What it gives you is a nice, consistent layout on just about any browser (including hideously-broken variants of MS Internet Exploder) which you can slice and dice pretty much at will. It even degrades sensibly for non-CSS, non-JS browsers — like the Googlebot. If you know how to read CSS reasonably well and this takes you more than an hour to grok, you're working too hard. Browse the docs, the samples, the tutorials, or the online community rumblings, and you'll be productive in no time.

And yes, you can use both together on the same Web dev project. Just make sure you keep your tools straight. After all, YAML is not at all the same thing as YAML.

No comments: