Monday 23 November 2009

Reuse, Renew, Recycle - Data Structures Edition

Everything gets recycled these days, including acronyms...

Whether you're developing Web pages in your language of choice, writing a new game to take on the world, or (insert your project here), at some point, you're going to deal with structured data.

You're quite likely, at some point, to deal with fairly complex, nested structured data - for configuration, to preserve the state of objects within your software, or whatever. In recent years, "complex, nested structured data" has often been coded up in XML. This can save your coding bacon; there are quite a few nice tools out there to work with XML, insulating you from having to deal with the "raw" <tag ...="" attribute="value">data<nested_tag>more data</nested_tag></tag> soup.

But at some point you may well end up having to hand-edit the thing — you may want to pull something out for reuse, or you suspect that one of the tools you're using is messing up in some way... it happens. And even during that bright, wonderful time when you're building a POC (proof of concept/ possibly overly c***tastic; see, I said we could reuse), you wind up humping a lot of "extraneous" data around that just really wraps a few small things. Ugh.

Even with compression, you're still taking a hit; you've just moved it from the pipe to the CPU. Twice.

I got handed a client project a little over a year ago that used this nifty notation called YAML. While YAML was apparently originally developed for Ruby apps, it's evolved; there are now support tools/libraries for everything from PHP to Visual Basic to Haskell. "YAML Ain't Markup Language," but it can express very complex data structures; nesting is defined by indentation, item types by decorations. (See the spec or this PHP implementation for more detail.)

For projects which have been using binary data storage for non-BLOB data, simply because XML was seen as overkill, this might well simplify things. Likewise, if you're pushing data around between different apps/platforms, thinking "I maybe oughta look at JSON", this is for you.

I've been using it primarily in PHP to encode various data and content models, and it has allowed me to dramatically simplify my code in several places - reducing complexity, inefficiency, and possible bugs. The Ruby and Python crowds have been using this for a while; PHP folk have tended to look at it suspiciously, since it came from Over in Those Languages.


Good tools work in any language.


Reuse? I was talking about recycled acronyms earlier?

At about the same time I found YAML the non-markup-language markup language, I also came across a nifty CSS framework with the highly original name of YAML. This code, which makes it pretty easy to style sites using Yet Another Multicolumn Layout, is a set of CSS (2.x) stylesheets, with minimal graphics and JavaScript thrown into the box. What it gives you is a nice, consistent layout on just about any browser (including hideously-broken variants of MS Internet Exploder) which you can slice and dice pretty much at will. It even degrades sensibly for non-CSS, non-JS browsers — like the Googlebot. If you know how to read CSS reasonably well and this takes you more than an hour to grok, you're working too hard. Browse the docs, the samples, the tutorials, or the online community rumblings, and you'll be productive in no time.

And yes, you can use both together on the same Web dev project. Just make sure you keep your tools straight. After all, YAML is not at all the same thing as YAML.

Wednesday 18 November 2009

Two steps forward, three steps back

Alternate title: ARRRRRGGGGGHHHHHHH!!!!!

Both of you Gentle Readers may have noticed that I've been away from the blog for a while, and that a few posts that were previously published have gone missing. I've been busy fighting some other fires for a while, and my current network access has lacked the stability and efficiency that local propaganda would have you expect.

This evening (Wednesday 18th) I came across a piece of nifty-looking software, MacJournal by Mariner Software. It looks great — software that would let me compose/revise blog posts offline, in a native Mac app with nice organizational features and so on, at an attractive price, and with a 15-day evaluation period thrown in, just so you can try before you buy.

"Cool," I thought; "I'll be able to multitask on my shiny new MBP that's coming Any Day Now™."

Downloading and installing the eval copy went just as you'd expect; drag an icon into a folder, wait for the "Copying" progress dialog to go away, and it's done. Standard Mac user experience; nothing to see here, folks — unless you were expecting a Windows-style "Twenty Questions" installation.

I decided to do a really simple, trivial first exercise: select the five posts I'd written so far in a tutorial series; add the keyword ("label" in Blogger.com parlance) tutorial to them; save them back to Blogger. No animals were harmed in the performance of this experiment, and very explicitly, no content was directly, intentionally edited. (Note the qualifiers; they're important.)

(insert "train wreck" sound effects here.)

The first two parts were (relatively) unmolested; they didn't have any code blocks in them. The latter three did, however, and those were completely deformed. Numerous span elements were added, particularly around links (MacJournal seems to think links shouldn't be underlined, ever). Other formatting was changed; in particular, code tags were replaced by spans that set the font size to 13 points.

WTF?

It's going to take me a bit of noodling around in the software to figure out how to change the defaults to something that makes sense (at least for me), and until then, I'm back to editing in the browser. If the evaluation period expires before I'm happy with the configuration, then I'll comply with the license and blow it off my system. I'd really rather not do that; the feature list looks good, the interface is clean, and best of all, I don't have this Could not contact blogger.com line underneath my editing area as I type.

I understand that MacJournal, like most apps, has default ways of laying things out and working with things. I'm well aware of the difference between an "import" and a "copy" of something. But... I believe very strongly that the first rule of software, as medicine, should be "First, do no harm" — and that includes "don't mess with my formatting without even putting up a confirmation dialog asking my permission!" I really don't think that's too much to ask, or too hard to implement — and doing so would a) make a much more positive initial user experience by b) showing that you've thought things through well enough that c) your still-potential user isn't looking at an hour or two of careful, detailed work just to get back to where he was before he touched your product &emdash; or, rather, it touched his work.

Like most Mac users, I've gotten spoiled by how well most software on this platform is thought through to the tiniest details. Like most, I get annoyed when I have to deal with Windows or Linux apps that simply aren't thought through at all, apparently. (Spend a week with Microsoft Office or, even better, Apple iWork on the Mac; I dare you to go back to Office on Windows and be happy with it.) To run into a Mac app that fails such a simple use case so spectacularly (granted, in its default configuration) simply beggars explanation.

You want to start a tutorial; well, you know...

Not as catchy as the Beatles' Revolution, even if the meter works.... oh well....

Continuing from the first post in this tutorial. What do I think is important when starting to demonstrate some code? As with most writing, it depends on the audience. For the purpose of this series of posts, I'm assuming that you fit comfortably in or near the following:

  • You're comfortable with HTML and XML doesn't make you run screaming from the room;
  • You have a basic understanding of databases; you've run across SQL before and understand the basic concepts;
  • You understand PHP; you've written some code before;
  • You understand the concepts of "object-oriented development", "patterns", "best practices" and ideally "test-driven development" (usually abbreviated as "TDD"), even though you may not have loads of experience (yet) with them; and crucially
  • You want to improve your ability to write code that you can refine and possibly reuse over time.

The assumption that you know or at least are interested in PHP is a given, since that's the language we'll be using here.

What will you need to have installed and available to follow along?

  1. Access to a system with PHP 5.2 or higher, available both from the command line and the Web server (via a module or CGI);
  2. The PHPUnit and MDB2_Driver_mysql modules installed and available;
  3. A text editor of your choice;
  4. The ability to create PHP scripts and HTML files and have those accessible from the Web server as well as the command line.

These should all be pretty obvious to more experienced PHP developers, but making sure that we're both operating from the same set of assumptions — and no others — greatly reduces the likelihood of confusion and breakage along the way. Many of you haven't yet dealt much with unit tests using PHPUnit or similar systems; that's going to be a starting point for us.