Archimedes' Lever: best practices

Showing posts with label best practices. Show all posts

Wednesday, 14 May 2014

Why is "doing things right" a "thing"? Whither professionalism?

Over on the Reddit r/programming subreddit, /u/jfalvarez posted a link to Tom Stuart's great presentation at the Scottish Ruby Conference 2014, entitled "The DHH Problem".

If you're involved with Ruby or Web development in any way, it's well worth three and a half minutes of your time.

On the Reddit post where Mr Alvarez posted the link to the video, another Redditor, dventimi, asked

Why is this a thing?

I disagree with the later poster who snarked "People like gossip." It's much, much more important than that.

What follows is the content of my reply to "why is this a thing?" I'd love to have a discussion with you about it, either here or on the original post.

Because people are finally waking up to the fact that the coding style and philosophy espoused by DHH are disastrous in non-toy applications maintained over any significant period of time.

That this should have been obvious to all, but wasn't discussed as such, once Twitter, after having conflated Rails with Ruby as a whole, left Ruby because they couldn't make Rails do what they needed to no matter how many times DHH came onsite and insulted their intelligence, speaks eloquently to how broken things are.

Most master software developers I've met and worked with have been mediocre-at-best promoters, including self-promoters. DHH has the opposite problem: his initial extraction of Rails from Basecamp's codebase "worked", for apps very much like Basecamp at the time. His fixed, what some have called "trolling", attitudes and the personality they describe, prevent him from seeing things from any other perspective or from growing his skills beyond a certain point. The echo chamber of acolytes and zealots his sublime self-promotional talents have allowed him to accrete around himself "ensure" that he never has to.

That's a problem for the rest of us, who have deadlines, work to finish within those deadlines, and paycheques we'd like to be able to cash that we won't get unless we meet those deadlines with that work. Especially for those of us who wish to improve our craft and, in whatever way we can, nudge ourselves and others towards helping the craft of software development more closely approximate a true engineering discipline.

Paying excessive heed to the opinions-masquerading-as-principles brayed by DHH, and twisting ourselves to fit inside The Rails Way, actively impede all of the above tasks. Tom Stuart was absolutely right to thank DHH for producing Rails and promoting it sufficiently that the rest of us get to write Ruby for a living…just before tearing DHH a new one, in the polite, discreet way only a well-educated Brit can, for getting so much of Rails and how software should be written so disastrously wrong.

Thank you, Tom Stuart. An entire industry may well be in your debt.

Thursday, 20 February 2014

Trust, Safety, Sandi Metz and A Proverb Walked Into A Bar...

Rereading POODR again. I love the bit in Chapter 6 where she talks about using a post_initialize method to decouple subclasses from having to remember to call super in an initialize method override. That's the sort of thing I remember making work in UCSD Pascal back when it wanted to be the Next Hot Language™, and in a few languages since.

But one question I never found an acceptable answer to was, how do I prevent myself from forgetting to call super (or its equivalent) in a hierarchy with multiple levels? Sure, the base class implements a do-nothing post_initialize-type method, so that every single subclass anywhere in the hierarchy can "kick it upstairs" and things will Just Work.

But at some point, you have to trust yourself, your team, your successors who grapple with your code long after you are gone. Comprehensive tests will catch the fallout from a forgotten upstairs kick and give you a lovely red bar — or will show you that the parent's chained behaviour you thought you needed was either frivolous or incorrectly verified. (Or both. Shtuff happens; nobody ever went broke underestimating the degree to which developers can fool ourselves.) Whatever twisty little maze of keystrokes leads to you that point, you've discovered a refactoring opportunity. Enjoy your Homer Simpson Moment, make the appropriate changes, verify that the tests still pass, and move on.

Доверяй, но проверяй.

Wednesday, 1 January 2014

Clean Code. "I do not think that word means what you think it means."

From my copy of Clean Code: A Handbook of Agile Software Craftsmanship (the paper copy of which (barely) predates my work in Rails. The highlighted paragraph is what I'd highlighted in the paper copy:

Active Record

Active Records are special forms of DTOs. They are data structures with public (or bean- accessed) variables; but they typically have navigational methods like save and find. Typically these Active Records are direct translations from database tables, or other data sources.

Unfortunately we often find that developers try to treat these data structures as though they were objects by putting business rule methods in them. This is awkward because it creates a hybrid between a data structure and an object.

The solution, of course, is to treat the Active Record as a data structure and to create separate objects that contain the business rules and that hide their internal data (which are probably just instances of the Active Record).

I've been increasingly uncomfortable with the way Rails encourages people to conflate the domain model objects with persistence in Rails' core component, ActiveRecord. Running across this bit again makes me think of Inigo Montoya from The Princess Bride: You keep using that word. I do not think it means what you think it means. Judging from my own projects and those I've participated in and observed on Github, the fact that ActiveRecord does not properly implement or encourage the Active Record pattern is a primary root cause of many bugs and much confusion and heartburn.

Happy New Year.

Sunday, 3 November 2013

No, You Can't Always Get What You Want...

…but as Mick Jagger and Keith Richards remind us, if you try "sometimes you get what you need". Any software developer lives with that on a daily basis; the implications of that are one of the major things that separate our craft from the profession of engineering. I had several of those "implications" blow up in my face this week as I was working on (what else?) a Rails app.

Or, more properly, an app that uses Rails. Because what the majority of my time has been spent with for the last few weeks has not been Rails, or even Ruby, but trying to get the front-end features that have to be implemented in JavaScript (or CoffeeScript or another language that compiles to JavaScript) (generically here, Script) because they're dependent on user-agent and DOM events. The last time I went down this road, I wound up with an app that had five lines of CoffeeScript for every two of Ruby; this one is well along that path. That's relevant; I suspect that there are far more non-trivial apps that use Rails in production than there are Rails apps; for anything beyond a simple pseudo-blog or relatively straightforward database front end, you sometimes wonder what the early Rails core team was thinking when they established some of these "conventions" that trump "configuration". Granted, there are a lot of people using that "sweet spot" for Rails app development, and more power to them, but I'd bet heavily that there are at least as many who started out trying to do an app that uses Rails and fairly quickly find themselves working around or against things as much as they work with them.

A prime example of this is front-end work. Rails has support for *Script and the asset pipeline as ballyhooed features. Once you start writing more than a few dozen lines of Script code in three or four classes or modules, you quickly come to the conclusion that the people who designed and held court over how Rails works really didn't have much more Script experience than using snippets of jQuery or Prototype.js to "jazz up" a form. Heck, the "unobtrusive JavaScript" feature is essentially useless outside forms. What I'm working with at the moment is an app that intelligently reacts to a user selecting content with the mouse, based on the position of the content, the permissions and authorisation of the user, and other factors. Having a few Script classes (and associated specs) to deal with that introduces issues that may not be readily apparent in the beginning, but will bite your team over time.

The JavaScript community have developed several module loaders to work around the lack of such a feature in the core language (for now, at least); there seems to be something of a transition underway from CommonJS to RequireJS, with several alternatives providing variations on the theme. This is important for several obvious reasons, including the ability that programmers have enjoyed in various languages since at least the 1960s to develop, test and maintain cooperating code modules (objects, classes, etc) relatively independently. By having each such module load, and know about, only the other modules it uses, those modules (should) remain unaffected by other modules in the system, even indirect dependencies that continue to work as their dependents expect. Programming 101 stuff; nothing new for anyone reading this.

But, and it's a very large but, the Web prizes a site's ability to perform as few network accesses as practical to get its resources loaded and start doing useful work. To this end, Rails since version 3.1 has included the asset pipeline, which has the effect (broadly) of concatenating and compressing Script files, CSS stylesheets, and other resources so that they can all be loaded in a single request. This is done using an add-in Gem (component) called Sprockets. This works quite well for its intended purpose and truly is one of the nicest things about working in Rails — until you start writing modular Script code.

If you know that the asset pipeline, used by default, means that everything is loaded by the time any of your Script front-end code starts running, but you want to break your code into modules, the "obvious" choice is to create a lot of "global" objects for your classes and shared data and such. If you're an experienced developer, this feels wrong, almost dirty. You might then try only having a single global that functions as a top-level "namespace" for your application, with artefacts spreading out in a sensibly-organised tree below that. This path leads to names like @com.hudsucker.proxy.activation.foo.bar.baz.joebob.and.his.third.cousin. Maintaining such, or even keeping it straight in your head (and your teammates') quickly becomes an "interesting challenge". You quickly realise that there must be a better way, simply because there have been other developers before you who have remained (apparently) sane.

But to use loaders such as RequireJS from Rails, you need components like requirejs-rails, which is wildly, justifiably "popular". Sane Script dependency management should be in Rails, you start thinking. And I'd agree with you. However…

The Might-As-Well-Be-Official Rails "Solution" to the problem is: don't do it that way. Write your front-end code in a traditional Script framework like Dojo or Node or Ember or JoeBobBriggsAmazingFramework or any of no doubt numerous other excellent choices. Use Rails "merely" as the back-end API server for persistence and auth and so on. And yes, with planning and preparation and an adequately-staffed and -trained team starting well before your deadline, that's a splendid choice. You may even realise that Rails is overkill for your needs and move to something like Sinatra or Padrino or, again, numerous others.

But…what if you don't have that time or that team, but you do have a mass of rigidly-organised, fragile Script code that you just promised would be part of a working app in two weeks' time? Assuming that you agree that resignation and suicide are suboptimal strategies, you could try what I'm in the middle of retrofitting our code to:

Create a single, gateway object that you use to get all your other objects. Let's call it a Gateway object. give it a global location at @ourapp_gateway. Give it two methods:
1. a register method, that associates a string name with an arbitrary object/class/other collection of bits;
2. a use method, that takes a string parameter and returns the item associated with that name.
Remove the Sprockets #= require modulename lines from each of your existing Script files, and replace them with immediate calls to @ourapp_gateway.use corresponding to each of the objects you need from Your Outside App, as well as calls to @ourapp_gateway.register to register the object(s) declared in that source file with the Gateway;
Add Sprockets #= require directives to your application.js, so that Sprockets actually loads the suckers;
Assign the return values from @ourapp_gateway.use to variables as would be done in Node or whatever.

Why bother, I hear you thinking. Because, even without a proper module loader, this still delivers at least one major benefit: it abstracts away the relationship between your artefacts, the DOM, and the asset pipeline. It gives you reasonable flexibility in moving things around within the DOM and with respect to source files by having the Gateway instance as an intermediary. And, if you're thoughtful, diligent and very, very lucky, it gives you a leg up when you do get around to using a proper module loader, by letting you slip that in behind your Gateway abstraction.

And if you think I'm getting somewhat jaded/disillusioned with my continuing Rails experience and integrating a Script-using front end that's not a complete app in itself…congratulations; you nailed it.

Anybody have any better ideas that they've actually made work?

Friday, 26 October 2012

Take Only Pictures; Leave Only Footprints

If you've ever visited a National Park or many state parks in the United States, or been involved with a nature-oriented community group, you've likely heard that saying many times. For those who haven't, the meaning should be obvious: leave the shared place that you're moving through in at least as good condition as you found it, so that the next people to travel that way can enjoy it as much as you did.

That applies to environments other than the great outdoors, of course. The phrase popped into my mind earlier today as I was poking at a jQuery plugin to add a context menu to a Web app I'm developing.

It's great that there's so many free software tools out there; I've been using them for more than 25 years, and I've developed more than a few. Sometimes, however, one is reminded of the wisdom of Oliver Wendell Holmes when he said, "Learn from the mistakes of others… you can't live long enough to make them all yourself." But in the grand software tradition, we often give it our best effort.

This is a reasonably well-known plugin, one of several that offer to solve a common problem in more-or-less similar fashion. And if you use it precisely in the way that the author expected, it does the job. But even then, if you're using it in a Web app or site that customers are paying money for, you might hope that they never hit the "View Source" menu item in their browser. And, to be fair, this is not intended as a slam against Matt Kruse or his code; most of the other plugins I've looked at in the last couple of days have at least as many "quirks" and assumptions.

"It's just a context menu," I hear you say. "What could possibly go wrong?…go wrong?…go wrong?…(sound of tape snapping)" (with apologies to Westworld, a movie with many lessons on the art and craft of software development.)

If you're adding the menu to your page when it's first loaded, and you want a static menu, with the options the same for all uses of the context menu, fine. If you want a more dynamic menu, and don't mind attaching functions to the menu that know enough of the detailed state of your page/app that they can generate the properly-adjusted menu items each time the menu is displayed, that works, too. But the main assumptions are that the plugin's main method will be called once and only once on a page, and that you really don't care about the markup being added to the end of your page. The first assumption, particularly in a rich front end, is likely to be naïve; the second is, bluntly, an assault on your professionalism.

What's so bad? Here's the plugin's output for a very simple menu, reformatted for clarity:

<table cellspacing="0" cellpadding="0" style="display: none;">
  <tbody>
    <tr>
      <td>
        <div class="context-menu context-menu-theme-vista">
          <div class="context-menu-item " title="">
            <div class="context-menu-item-inner" style="">One</div>
          </div>
          <div class="context-menu-item " title="">
            <div class="context-menu-item-inner" style="">Two</div>
          </div>
          <div class="context-menu-item " title="">
            <div class="context-menu-item-inner" style="">Three</div>
          </div>
        </div>
      </td>
    </tr>
  </tbody>
</table>

<div class="context-menu-shadow" style="display: none; position: absolute; z-index: 9998; opacity: 0.5; background-color: black;"></div>

If you read HTML and CSS, you're probably nodding your head, thinking "that looks simple enough; hey, hang on a minute…" That "hang on" is where you start seeing any of several issues:

We have a table, containing a (rarely-used) tbody, containing a single tr, containing a single td, containing a series of nested divs which contain the menu items. If he'd implemented the entire menu as a table, he might have been able to claim that he was trying to support 1997-era browsers. (Would jQuery 1.4 work in Netscape Navigator 4?) Failing that, it's a mess. Nesting divs (or more semantically appropriate elements) has been a Solved Problem for a decade or so.
Even though this is implicitly intended to be fire-and-forget markup added to the DOM at page-creation time, it's disturbing that there are no identifiers for any of the table elements. If you wanted to, say, remove the table later, you'd have to use code like
```
$('.context-menu').parent().parent().parent().parent().remove()
```
Ick. And then you'd have to go clean up the shadow div separately.
Speaking of the shadow div being separate from the menu-enclosing table, why not wrap both in an (identified) div, so that you (and your clients' code) can treat the menu as a single entity?
Doing that would also let you dynamically delete and replace the menu, based on changes in the state of your app, without needing to define menu-item handler functions that violate the Principle of Least Knowledge.
Another benefit of that would be if your context menu was generated more than once (possibly because the content needed changing or items needed disabling), you'd never have more than one copy of the menu in the DOM. As it is now, you can have an arbitrarily large number, with only the most recently-added being the "active" one. Ugh. This would be more likely to happen on a dev box running a test suite, but still. Ugh.
Particularly in a behaviour-driven or test-driven development (BDD or TDD) environment, being able to test/validate the markup and the item-handler logic separately as well as together is important. Doing so with this plugin (and again, to be fair, most of the others) eliminates the normal-use workflow from consideration.
One feature of this particular plugin is that it supports having you define your own HTML for the menu and pass it in. But the example given is too simple to be useful as a menu. Browsing the plugin source code seems to indicate that there is no event-handler support for menus defined in this manner; you'd have to iterate through your context menu items, assigning event handlers for at least the click event. If you're going to do all that work, why bother with this plugin?

A table and divs. Enclosed and in parallel. (Careful wiping that green sludge off your monitor; that's your brain that just exploded.)

Menus are "important" enough that HTML 5 has its own set of elements dedicated just to marking up menus. However, sensible, reasonably semantic standard patterns for use with HTML 4 and XHTML 1.x have evolved that address the issues I've mentioned (among many others). The markup for the example menu above could have been written as:

  <div class="context-menu-container" id="someId">
    <ul class="context-menu context-menu-theme-vista">
      <li class="context-menu-item">
        <span class="context-menu-item-inner">One</span>
      </li>
      <li class="context-menu-item">
        <span class="context-menu-item-inner">Two</span>
      </li>
      <li class="context-menu-item">
        <span class="context-menu-item-inner">Three</span>
      </li>
    </ul>
    <div class="context-menu-shadow"></div>
  </div>

One outermost div, with an id so your plugin doesn't get confused if you have multiple menus on a page, but one div so you can work with it as a single unit. An unordered list containing list items corresponding to the menu items, since that's the closest HTML4/XHTML 1 comes to HTML5's menu semantics. The Script that builds the thing can adapt the styles and attributes at the class+id CSS level, eliminating the need for hard-coded monsterpieces such as that style attribute for the table.

If you're going to write a jQuery plugin, live in jQuery. You can dynamically modify styles, attributes, content, even whole sections of how your page is rendered in the browser, without touching the basic markup. You can put the style bits that you intend to remain constant into a CSS file and link those styles to individual DOM fragments via classes and ids. That also keeps your HTML and CSS clean and cacheable.

If we don't write clean code, then having a lightning-fast browser on a petabit-Ethernet connection won't matter; users and reviewers will still complain that "your app is slow". Think before, as, and after you code, and remember:

First, do no harm.

Sunday, 14 October 2012

It's Not Just "JavaScript" Anymore

Late to the party again, as usual, but…

I've been reading up on ECMAScript 5, the standardised language historically and popularly known as "JavaScript". Even though I'm doing most of my work in CoffeeScript now, ECMAScript is relevant because that's what CoffeeScript compiles down to. (At least, to a 'safe', 'good bits' subset thereof). So here I've gone and mentioned three different names for "very similar" languages in the same paragraph. When I mean one in particular from here out, I'll name it; otherwise Script refers to all three to the best of my knowledge. Anyway...

One of the things I find myself doing, and reading in just about everybody else's code in most languages, is a prudent sanity-check validation before doing something "dangerous" that profoundly changes the state of the system based on the state of various objects within it. A decidedly non-trivial part of that is usually simple validation of individual property values; is the someFutureDate value actually greater than the value for Now? Should we check that every time we're about to make an important decision based on the value of someFutureDate? What a pain…especially knowing that the one time you forget to do that check often turns out to be the root cause of a critical bug?

ECMAScript 5 has completely rethought how properties on objects are implemented. They can be made read-only, non-enumerable, non-modifiable once set. Perhaps more interestingly, they can now be defined either using a simple value as before, or by defining getters and setters, as is done in several other languages. What this buys you, at the cost of a bit of relocated complexity, is the ability to validate assigned values without the assigning code doing anything other than assigning a value to the property. The domain logic relevant to a specific property of an object can now be coupled more tightly to the property and the details less visible or relevant to outside code, in effect creating a conceptual "mini-class" around a property and its getter/setter logic that is transparent to outside code.

This and several other features of ECMAScript 5 now make it easier to write nice, fine-grained, SOLID code in ECMAScript 5 than was previously possible in any Script dialect. Huge win all around.

Which brings up questions. How could support for property descriptors and the like be added to CoffeeScript? Should they be added? Would support for these underlying ECMAScript features require changes to the CoffeeScript compiler itself, or could it be achieved less intrusively?

Saturday, 8 September 2012

Stubs Aren't Mocks; BDD Isn't TDD; Which Side(s) Are You On?

I just finished re-reading Martin Fowler's Mocks Aren't Stubs from 2007. I wasn't as experienced then in the various forms of agile development as I am now, so couldn't quite appreciate his perspective until somebody (and I'm sorry I can't find whom) brought up the paper again in a tweet a month or two ago. (Yes, that's how far behind I am; how do you do when you're working 15- to 18-hour days, 6 or 7 days a week for 6 months?)

In particular, the distinctions he draws between "classical" and "mockist" test-driven development (TDD), and then between mockist TDD and behaviour-driven development (BDD) are particularly useful given the successes and challenges of the last dozen or so projects I've been involved with. I wouldn't quite say that many teams are doing it wrong. They/we have been, however, operating on intuition, local folklore and nebulously-understood principles gained through trial-and-error experience. Having a systematic, non-evangelistic, nuts-and-bolts differentiation and exploration of various techniques and processes is (and should be) a basic building block in any practitioner's understanding of his craft.

Put (perhaps too simply), the major distinction between classic and mockist TDD is that one focuses on state while the other focuses on specific, per-entity function; projects that mix the two too freely often come to grief. I believe that projects, especially midsize, greenfield development projects by small or inexperienced teams should pick one approach (classic or mockist TDD, or BDD) and stick with it throughout a single major-release cycle. You may credibly say "we made the wrong choice for this product" after getting an initial, complete version out the door, and you should be able to switch the next full release cycle to a different approach. But if you don't know why you're doing what you're doing, and what the coarse- and fine-grained alternatives are to your current approach, you can't benefit from having made a conscious, rational decision and your project thus can't benefit from that choice.

Anything that gives your team better understanding of what you're doing, why and how will enhance the likelihood of successfully delivering your project and delighting, or at least satisfying, your customers. Even on a hobby project where your customer is…you yourself. Because, after all, your time is worth something to you, isn't it?

Sunday, 6 May 2012

Rules can be broken, but there are Consequences. Beware.

If you're in an agile shop (of whatever persuasion), you're quite familiar with a basic, easily justifiable rule:

No line of production code shall be created, modified or deleted in the absence of a failing test case, for which the change is required to make the test case pass.

Various people add extra conditionals to the dictum (my personal favourite is "…to make the test case pass with the minimal, most domain-consistent code changes currently practicable") but, if your shop is (striving towards being) agile, it's very hard to imagine that you're not honouring that basic dictum. Usually in the breach at first, but everybody starts in perfect ignorance, yes?

I (very) recently was working on a bit of code that, for nearly its entire existence over several major iterations, had 100% test coverage (technically, C0 or line coverage) and no known defects in implemented code. It then underwent a short (less than one man-week) burst of "rush" coding aimed at demonstrating a new feature, without the supporting tests having been done beforehand. It then was to be used as the basis for implementing a related set of new features, that would affect and be affected by the state of several software components.

That induced some serious second-guessing. Do we continue mad-hatter hacking, trusting experience and heroic effort to somehow save the day? Do we go back and backfill test coverage, to prove that we understand exactly what we're dealing with and that it works as intended before jumping off into the new features (with or without proper specs/tests up front)? Or do we try to take a middle route, marking the missing test coverage as tech debt that will have to be paid off sometime in The Glorious Future To Come™? The most masochistic of cultists (or perhaps the most serenely confident of infinite schedule and other resources) would pick the first; the "agile" cargo-cultist with an irate manager breathing fire down his neck the second; but the third is the only pragmatic hope for a way forward… as long as development has and trusts in assurances that the debt will be paid in full immediately after the current project-delivery arc (at which time increased revenue should be coming in to pay for the developer time).

The moral of the story is well-known, and has been summed up as "Murphy " (of Murphy's Law fame) "always gets his cut", "payback's a b*tch" and other, more "colourful" metaphors. I prefer the dictum that started this post, perhaps alternately summed up as

Don't point firearms at bits of anatomy that you (or their owner) would mind losing. And, especially, never, ever do it more than once".

Because while even the most dilettante of managers is likely to have heard Fred Brooks' famous "adding people to a late project only makes it later" or his rephrasing as "nine women cannot have a baby in one month", too few know of Steve McConnell's observation (from 1995!) that aggressive schedules are at least equally delay-inducing. If your technical people say that, with the scope currently defined, that something will take N man-weeks, pushing for N/4 is an absolutely fantastic way to turn delivery in N*4 into a major challenge.

Remember, "close" only counts in horseshoes and hand grenades and, even then, only if it's close enough to have the intended effect. Software, like the computers it runs on, is a binary affair; either it works or it doesn't.

Wednesday, 18 April 2012

'Least Knowledge' May Be More than You Think

I was chatting on IRC the other day and got into a discussion that stuck in the back of my mind until now. This guy was a fresh convert to the ideas of SOLID and to the Law of Demeter (or Principle of Least Knowledge). Now, these are principles I hold at least as strongly as the average experienced developer, and we eventually agreed to disagree over the propriety of code such as a line I just wrote (which in fact prompted this post).

Consider the Ruby code:

    data[:original_content] = article.content.lines.first.rstrip

This says:

"take whatever your article is;
get whatever its content attribute is or method returns;
(assuming that is a String or something that can be treated as one,) get its lines enumeration;
get the first entry in that enumeration;
(again, assuming it's a String or workalike,) strip any trailing whitespace from that string; and
stuff it into the collection data, indexed by the symbol :original_content."

It's not hard to argue that this is a bad code smell, being a long list of assumed return types and so on, except for one thing: in my view, the LoD does not cover sequencing a series of standard library calls.

Look at the code again. The bit that references my code is the fragment

  data[:original_content] = article.content

That fragment is clean; article is an object and content is an attribute of or method on that object. This code assumes that the value of article.content is a standard String. As a standard class, it's virtually guaranteed not to change in backwards-incompatible ways over the useful life of the code. What then? Standard library calls!

String#lines returns a standard Enumerator instance;
Enumerable#first (via the earlier Enumerator) returns a standard String instance again; and finally
String#rstrip cleans off any trailing whitespace.

Nowhere are used any internal APIs that could change. Breaking that one line apart into 5 would produce repetitive, non-semantic, anti-Ruby-best-practices code of the style pandemic on large Java projects. We're in Ruby (and Rails) for a reason: part of that reason is that the language idioms encourage an almost literary expressiveness that makes more explicit the reality that a program is a conversation between the humans developing and maintaining it, that just happens to be executable by a computer. Literate, semantic programming is like behaviour-driven development; once you've wrapped your head around it and see how much better a programmer it makes you, you really don't want to go back to BDUF jousting in Java or PHP.

On further review...

Now, if you argue that the LoD is an aid in reducing coupling between classes, which reduces per-statement complexity, I can see that. But, in the example above, we're back to the fact that there is only one dot before you start jamming together standard library calls. Had I instead written:


    content = article.content
    lines = content.lines
    first_line = lines.first
    data[:original_content] = first_line.rstrip

then I'd argue that I've introduced three unnecessary local variables when only one (the second) could foreseeably change. If I later modified the return value from article.content to something other than a Stringlike object, then probably the assignment to lines would have to change. But only that one line — the same level of change that would be required in my current code.

Comments?

Tuesday, 7 February 2012

If At First You Don't Succeed, Try, Try Again

In a very real sense, that's a big part of what we do in test-driven or behaviour-driven development, isn't it?

It doesn't only apply to classes and projects. Teams work that way, too. You can have someone who, on paper, seems just great to help you out with the cat-herding problem that's a large part of any significant project. The interview is fantastic; he's 110% buzzword-compliant with what (you think) you need The New Hand to be able to pick up and run with. There might be a few minor quirks, niggles; a voice in the back of your mind standing up and clearing its throat obnoxiously, and when that doesn't get your attention, starts screaming loudly enough that surrounding folk reach for ear protection. When that doesn't get your attention — because, after all, you're talking to such a fine candidate — the voice might well be forgiven for giving up in disgust and seeking the farthest corner of your brain to sulk in. And give a good kick to, every now and again.

And then Something happens, usually a string of increasingly serious, increasingly rapidly-occurring Somethings, and you realise that this really isn't working out so well after all. The candidate-turned-colleague may well be a very nice person, but you just can't communicate effectively. Or the "incremental" test that she writes requires changes to a dozen different classes that, in turn, break 15 existing tests. If that doesn't make you go 'hmmm...', then maybe your other team members are concerned about his professionalism: consistently showing up late to meetings; giving a series of increasingly outlandish reasons for not coming into the office and working from home despite having previously promised not to, and so on. Seemingly before you know it, the project is at a virtual standstill, team dysfunction has reached soap-operatic heights, and at least one vital member of the team starts running around and shouting "off with her head!" All of which does absolutely wonderful things to your velocity and project schedule. Or maybe not so wonderful.

And the funny thing is that, at about that time, you remember that voice in the back of your mind that was trying to point out that something was rotten in this particular virtual state of Denmark from the beginning; you were just too optimistic, and likely too desperately stubborn, to recognise it at the time. If you're going to go about hiring someone, have a good rapport with your instincts. Know when to walk away; know when to run.

Long story short, we are looking for an awesome new team member. As you might have gathered from the last few blog posts, we're a Ruby on Rails shop, so you'd imagine that heavy Rails experience would be tops on our list, right? Not so fast, friend. I/we subscribe to the philosophy that source code (and the accompanying artefacts) are part of the rich communication that must continuously thrive between team members, past, present and future. These artifacts are unusual in that they are in fact executable (directly or indirectly) by a computer, but they're primarily a means of communication. As such, ability to communicate effectively in English trumps wizard-level Ruby skills. If we can't communicate effectively, it doesn't matter how good you are, does it?

What we're looking for at the moment can be summarised thusly:

1) Demonstrated professionalism. This does not mean "a long string of successful professional engagements", though obviously that doesn't hurt, especially if they individually lasted more than the sensible minimum. Do you generally keep your word? (Shtuff happens to everybody; nobody's perfect, but overall trends do matter.) Are colleagues secure in the knowledge that you routinely exceed expectations, or are they continuously making contingency plans?

2) Fluent, literate standard English at a level capable of understanding and conveying nuance and detail effectively and efficiently. This would be first on the list, were it not for the series of unfortunate events that led to this post being written at all. It is an absolutely necessary skill for each member of a truly successful team, and a skill that is becoming depressingly rare.

3) Customer-facing development experience, ideally in a high-availability, high-reliability environment. We're building a service that, if we're successful, people are going to "just assume" Works whenever they want it to, like flicking on the lights when you walk into a room at night. Those of us who've been around for a while remember the growing pains that many popular services have gone through; we have no plans whatever to implement a "fail whale".

4) Ruby exposure, with Rails 3 a strong preference. You can fill in along the edges as we go along, but if we can't have an informed discussion of best practices and how to apply them to our project, you're (and we're) hurting. If you're unclear on the concepts of RSpec or REST or Haml or _why, you're going to have to work really, really hard to convince us that we're not all wasting time we can't afford to. And, finally,

5) Deep OO experience in at least two different languages. If you can explain and contrast how you've used different systems' approaches to design by contract, encapsulation, metaprogramming, and so on, and have good ideas on how we can apply that to making our code even more spectacular, rock on. "Why isn't this #1 or #2 on your list," you might ask. If you can convince me why it should be, and why your skills are so obviously better than anybody else I'm likely to have walk in the virtual door... then you've already taken care of the others, anyway.

We're a startup, so we're not made of money. But since we're a startup, money isn't all we have to offer the right person. Know who that might be? Drop me a line at jeff.dickey at theprolog dot NOSPAM com. Those who leave LinkedIn-style "please hire me" comments to this post will be flamed, ridiculed, and disqualified.

Tuesday, 8 November 2011

Eloquence Is "Obsolete". We're Hurting. That's Redundant.

Code is meant to be read, understood, maintained and reused by humans, and incidentally to be executed by a computer. Doing the second correctly is far, far less difficult than doing the first well. Innovation is utterly meaningless without effective communication, and that is at least as true within a team as between it and a larger organisation, or between a company and its (current and potential) customers.

The degree to which a class framework, or any other tool, helps make communication more effective with less effort and error should be the main determinant of its success. It isn't, for at least two reasons. One, of course, is marketing; we as a society have been conditioned not to contest the assertion that the better-marketed product is in fact superior. In so doing, we abdicate a large degree of our affirmative participation in the evolution of, and the control over society at the small (team/company), mid-level (industry) and wider levels. We, as individuals or organisations, devolve from customers (participants in a conversation, known as a 'market', in which we have choices) into consumers (gullets whose purpose is to gulp endless products and crap cash).

More worrying is that effective, literate communication has gone completely out of fashion. Whether or not you blame that on the systematic laying waste of the educational system over the last thirty years, it's increasingly nonsensical to argue with the effect. People are less able to build understanding and consensus because they do not have the language skills to communicate effectively, and have been conditioned not to view that as a critical problem urgently requiring remediation. Oh, you'll hear politicians bloviating about how "the workforce needs to improve" or "education most be 'reformed' for the new era", but that's too often a device used to mollify public opinion, make it appear as though the politicians are Doing Something effective, and especially to preempt any truly effective public discussion leading to consensus that might effect real socioeconomic improvement rather than the "Hope and Change"™ genuine imitation snake oil that's been peddled for far too long.

Back on the subject of developers and tools, I would thus argue that what tools you use are a secondary concern; if you don't understand code that's been written, by others or (especially) by you, then a) that code can't be trusted to do anything in particular because b) someone didn't do their job.

Your job, as a developer, is to communicate your intent and understanding of the solution to a specifically-defined problem in such a way that the solution, and the problem, can be readily undestood, used, and built upon by any competent, literate individual or team following you. (Again, explicitly including yourself; how many times have you picked up code from a year or a decade before, that you have some degree of pride in, only to be horrified at how opaque, convoluted or incomprehensible it is?) Some computer languages make that effective communication easier and more reliable than others; some choose to limit their broad generality to focus on addressing a narrower range of applications more effectively and expressively.

That has, of course, now progressed to the logical extreme of domain-specific languages. General-purpose languages such as C, Ruby or Python try to be everything but the kitchen sink, and too often succeed; this makes accomplishing any specific task effectively and eloquently incrementally more difficult. DSLs are definitions of how a particular type of problem (or even an individual problem, singular) can be conceptualised and implemented; a program written in a proper DSL should concisely, eloquently and provably solve the problem for which it was written. This has sparked something of a continuing revolution in the craft and industry of software development; general-purpose languages arose, in part, because writing languages that are both precise enough to be understood by computers and expressive enough to be used by humans is hard; it still is. DSLs take advantage of the greatly-evolved capabilities of tools which exist to create other tools, compared with their predecessors of just a few years ago.

But the language you use to develop a program is completely irrelevant if you can't communicate with other people about your program and the ecosystem surrounding and supporting it. If half the industry reads and writes on a fifth-grade level, then we're literally unable to improve as we should.

To paraphrase the television-show title, it doesn't matter if we're smarter than a fifth-grader if that's the level at which we communicate. Improving that requires urgent, sustained and concerted attention — not only to make us better software developers, but to make the larger world in which we live a better place. Let's start by at least being able to communicate and discuss what "better" means. That in itself would be an epochal improvement, saving entire societies from becoming obsolete.

Sunday, 30 October 2011

Once more into the breach, dear colleagues, once more…

…or How I Learned to Stop Worrying and Love Serving Aboard Kobayashi Maru; a history lesson.

Once again, I've had an interesting couple of months. Between modern Singapore's regular effect on my health, some insane work and the opportunity to get even more insane work if my two best references weren't indefinitely unavailable (but hey, Thailand is usually lovely this time of year…or anytime, actually).

Ahem.

I am rediscovering a love for developing in Ruby after losing touch with it some ten years ago. In the Ruby 1.5 days, things like class variables and the hook system were either new and shiny, or had finally been thrashed into something both usable and beautiful. If programming was what you did to earn a living, then programming in Ruby was something that you thanked $DEITY for every day because, after all, how many people in this world get to use beautiful tools that make you measurably better at your craft while blowing your mind on a regular basis, and get paid for the privilege?

The Fall

Then Ruby on Rails came along, with the clueless-business-media hype and cult of personality built around G-d Himself, or at least the three-initial version of same as anointed by said media, and Things Changed:

New acquaintance: So what do you do for a living?
Me: I write computer software, mostly using Ruby, or when I have to, Delphi or C++.
N.A.: Oh, you're a Ruby on Rails programmer.

So I left Ruby behind. Delphi lasted for a while, until its competitor's lock on the default system combined with some spectacularly bad corporate timing to relegate it to an "oh yeah, I heard of that once" niche. And then came this gawky new kid called PHP.

PHP wasn't exactly "new" when I first got into it; version 3.x had been out for some time, and the in-beta version 4 had class-based object-oriented programming. It was more primitive than an 1898 Duryea automobile to a Porsche driver, but you could see that the basic principles were at least in sight. And so, as the dot-com bubble was starting to really inflate, I jumped into PHP with a vengeance. It wasn't a product of one of the existing tech-industry corporate titans and it wasn't tied to a single operating system or Web server (though it did seem to work best with the then-early Linux OS and Apache server).

The great thing about PHP is that just about anybody can poke at it for a while, and at the end have something that (seems to) work. The pathologically execrable thing about PHP is that the barrier to entry is microscopically low, lower even than for Visual Basic 6 back in the day. And so, the inevitable result was (and is) that you have literally hundreds of thousands, if not millions, who get jobs by saying "Yeah, I know PHP; I've done (this little site) and (that steaming mess with a lot of Flashy bling on it)." In reality, there are thousands, at most, of good PHP developers out there, with a few more thousands actively working at improving their craft.

Purgatory

So PHP 4 had a crappy object model that anybody could poke at or ignore at will. PHP 5, from mid-2004, started to get its head on straight with respect to both OOP and what it really took to do PHP well, but lots of damage had already been done. Innumerable client projects had either failed or become incredible maintenance/performance nightmares due to the shoddy code that PHP, and the community that grew up around it, encouraged mediocre/inexperienced/inattentive developers to write.

And then, in mid-2009, PHP 5.3 came into the world, and it was a glorious golden statue with legs of brown, wet mud. Many of us who wanted to see PHP evolve into a "properly" object-oriented language, along the path that it had been following, found much to rejoice in. Support for closures. Better internationalisation support. Far better garbage collection. A rearrangement and winnowing of the extension and application repository system that was the closest thing PHP has to Python's eggs or Ruby's gems.

PHP 5.3 also introduced what had to have been the most-requested new feature for years: namespaces. Namespaces are one solution to the problem of allowing the development team to organise collections and hierarchies of classes to both make them easier to work with conceptually, and of mitigating possible naming conflicts (class Foo in namespace Bar is "obviously" distinct from class Foo in namespace Barney.)

However, this is also where the legs of the "golden statue" were transformed into wet mud: the way in which the PHP namespace features work is so semantically and visually jarring, with so many inconsistencies visible to both the experienced pre-5.3 PHP developer and the experienced developer of object-oriented software in other languages, that it quickly became a laughingstock and a millstone. A necessary millstone, but one of which many writers of both code and prose waxed eloquent in their righteous, intricately-justified derision.

A New Hope

And this eventually served as a wake-up call to a number of those who view PHP as just another tool in their toolkit, as opposed to either a cash cow to be milked or a semi-religious icon to be polished and cared for in the precise fashion that the High Priests of Zend decree, or, those who simply never learned enough to care. Seven to ten years is a long time to spend in any one language for an experienced developer, and quite a number of highly-visible PHP community stalwarts have been publicly participating in and contributing to other communities: various JVM languages like Groovy, Scala and Clojure; Objective-C; C#; and Ruby.

A few short months ago, I had urgent need to re-immerse myself in Ruby, learn modern Rails, and make myself ready in all respects to participate in Ruby on Rails projects at a senior or leading level. This gave rise to a series of fortunate events, to paraphrase Lemony Snicket. I discovered that Ruby 1.9 is now a very advanced, mature language with solid experience-based best practices that are sensible and self-consistent. I discovered that Rails 3.1 is an incredibly productive way to get a Web site or application up and running, and that the ways in which it encourages you to code and think do not have the same propensity to inflict mortal wounds as, say, the overly-trusting PHP journeyman developer. Rails itself has grown to be a much larger community that no longer revolves around a single individual, or even a small group of individuals, as that it did seven years ago or as too many other languages do now. And, importantly, many of the things that make Rails great would not be practical, or even possible, were it built on and in any other language than Ruby

Above all, Ruby gives hope to those few, mindful of Knuth's saying that "programming is a literary act", understand that computer programs are written to be read and modified by humans, with computer execution almost a secondary concern over the life of a project. If you care about thinking creatively, if you enjoy having your mind regularly blown in ways that challenge you to actively and continuously improve your mastery of the craft of software development, you are going to love Ruby, and Rails.

An Evangelist Repurposed

And so, to any who are pondering a new Web project, who see how overwhelming the mindshare of PHP is and how obviously negligent (even to non-technical eyes) is far too much PHP code and, by extension, those who made such code available, I would ask you to strongly consider Ruby, and Rails, even if your team has little or no experience in them.

After all, quality and intelligence should, in any world worth living in, be major competitive advantages — or, rather, by rights, their lack must needs be mortal.

Thursday, 18 August 2011

The Yak Shavery; J. Dickey, Proprietor

Starting a new project for a new client, coming up to speed on a bit of kit almost like something I poked around with a ways back, and a ferociously Singaporean flu do not make for a productive week. I'm almost back to where I expected to be by noon on Monday. Since it's noon on Thursday…

What I'm trying to learn and leverage is Chef, an automated-configuration tool for computer systems (mostly servers). If you want to be able to reliably, repeatably set up a server (or a server farm), Chef or one of several available alternatives, will make your life much easier. Once you get your tool of choice working, that is.

Step One is, as various Chef amateur tutorials suggest, to start out with a plain-vanilla installation of your OS (Linux, in this case) of choice. Rather obviously, getting from bare metal (or bare VM disk platters, if you'd rather) to the basic working system should be push-button automatic as well. "Fine," you say, "nearly every major distro has its own automation system; surely I should be able to just pick up, say, a Red Hat Enterprise Linux clone like CentOS or Scientific Linux and use a Kickstart file, and it should Just Work. Right?"

Close; but then, "close" only counts for scoring purposes in horseshoes and (arguably) the use of hand grenades. This is neither, though it does have the ability to blow up in our faces with unpleasant consequences.

Putting together a basic Kickstart file to set up a base system (on which we can use Chef to complete installation and configuration; the whole point, remember) is itself pretty straightforward. Except for one "little" thing:

(not my actual screenshot; this one courtesy of Máirín Duffy.)

Hitting the "Reinitialize All" button will allow the install to finish as expected, but it's still a manual action in a process that's trying to eliminate such.

Some bit of Google-fu later and asking questions on the #centos channel of irc.freenode.net (thanks, Wolfie!) pointed me to the correct option to set when clearing the disk partitions, and all should have been peachy-keen and wonderful from that point on. Except it wasn't, of course.

Tests conducted with two different versions each of three RHEL clones on my VMWare Fusion 3.1.3 system all failed with a mysterious "cannot read repodata" fatal error being thrown by the installer.

Except, of course, that mounting the DVDs in question and browsing them showed the /repodata directory and contents exactly as they should be. Fast-forward through a day of flailing and fuzzing different options "just to see what happens", and you have a classic yak shave.

So, knowing that I'll have to get the RHEL versions working eventually, and having been previously warned against using Chef-on-Ubuntu as a learning exercise, I'm now spinning up AutoYAST on openSUSE, which is (experientially) what I should have started with from the beginning.

The wall, where my head has banged against it repeatedly, can be repaired. The calendar? Time is the ultimate non-renewable resource.

(Anybody who has any suggestions for Making This Work with RHEL clones, ideally Scientific 6.1 and/or CentOS 6: enlightenment would be Greatly Appreciated.)

Tuesday, 5 October 2010

Redundancy is Repetitive, Inefficient and Counterproductive

But you already know that.

I've long been a supporter of free and open source software, even contributing to a few projects. However, my enthusiasm has cooled a bit in the last several years, as the time I've been able to devote to such projects has both dwindled and been divided among more projects.

I'm continually flabbergasted by the number of open-source projects that cover the same ground, ad infinitum, often with little to no apparent differentiation between them. By continually "reinventing the same wheel," these small projects usually manage to put out a small number of alpha- or beta-quality releases, and then cease to make visible progress. This could be because the original developer(s) became frustrated or distracted; because the project succumbed to internal politics (a leading cause of FLOSS project death); or merely because the developer(s) realised that they were deep in the rut of a path that has been travelled many, many times before.

This flash of obviousness came about as I've been researching and evaluating PHP frameworks for Web development. I've been developing and maintaining sites using a couple of these, notably Kohana 2.3, but for various reasons have been reluctant to make the switch to the now-current 3.0 release. I decided to do a relatively formal, structured evaluation of several of the frameworks out there, rather than just pounce on the first I came across that looked "reasonable." (Silly me, but oh well.) Of course, as my list of frameworks to look at grew from six, to twenty, to considerably more, I became uncomfortable; torn between the reality that I'm doing this all myself (and devoting significantly less than 100% time to the endeavour), but at the same time wanting to remain open to pleasant surprises. After all, "pleasant surprises" were how I stumbled upon Kohana in the first place, and though I am strongly considering using another tool in its place (again, for non-technical reasons), contributing to a megalithic monoculture is not my idea of "adding value."

Today I did something that I should have done a week or two ago: I went out to SourceForge and got a list of all the PHP frameworks registered on the site.

Oh, my effing $DEITY! "Searching gives 1316 results."

Obviously, not all of those are appropriate (there are numerous reinventions of the CMS wheel, for instance); many can be filtered simply because they haven't published updates recently enough to meet my criteria, and so on, but that's all rather beside my current point.

Open-source projects are (in)famously both insular and prolific, leading to the kind of nonsense I ran into on SourceForge. Commercial products have a built-in inhibitor of this particular pathology: if a market gets divided among so many competitors that nobody can cover their costs, then in very short order the existing products will either consolidate into fewer, more sustainable alternatives or die altogether. Open-source projects, particularly those started by people inexperienced in such projects, fail to realise that they too are competing for resources just as much as their commercial brethren. In the FLOSS case, the finite, competed-for resource is time, both of (potential) contributors and of passive users (what the commercial world calls "customers").

Some open-source projects recognise this. In the Linux and Unix/BSD desktop worlds, the "market" for desktop managers (that provide the windows, icons, menus, and so on that are the UI to the user) has been dominated by two projects, KDE and GNOME. Though there are other choices with solid offerings and active communities, a developer can choose to develop for either KDE or GNOME with the confidence that virtually anyone in his target audience will be able to run his software on their system of choice. Consequently, most projects that have to make a choice, choose KDE or GNOME.

The Web is a different beast entirely, and while standards are for the most part very well-defined, the "barrier to entry" for those seeking to do things (allegedly) differently is very low. Some in the PHP Web development community advise new developers to tackle creating their own framework, as a learning tool. I suspect that that's how many of the SourceForge listings originated. I also happen to think that the advice itself is naïve, akin to a new auto mechanic being advised to build his own engine to gain a better understanding of those from Toyota or Ford. What I would recommend instead is that a new developer learn one or two of the more widely-used frameworks, browse around for another two or three that seem interesting, and start contributing to the project(s) of her choice. Virtually all such projects recognise their need for more help, as well as the fact that "we were all newbies once."

Consolidation in the PHP-framework field, while maintaining the low barrier to entry, should be a Good Thing™: to a greater degree than commercial products, FLOSS projects should succeed on their merits, technical and otherwise. As more developers join a well-run project community, the project's software should continue to both improve and find wider user/customer use. The ability to "do your own thing" should be kept, if only as a check against the use of marketing hype over technical merit.

But I believe that relatively small, informal "A- and B-List" groups of selections would be good for projects and developers (and developers' clients); a lot of what I've come across in the last few days would be on my "P-List," if not "ZZZZ."

Oh yeah, by the way: this sort of maturation and shared effort would do wonders for our Craft's progress towards true professional status. Any of us who've had our schedules and budgets tinkered with by people who admit they know nothing of what we do should understand the desirability of that goal. Given the degree to which software permeates public life and policy, that "desirability" has become a dire necessity; it's just more expedient for people to pretend that isn't really the case. Stuxnet, anyone?

Tuesday, 14 September 2010

Saving Effort and Time is Hard Work

As both my regular readers well know, I've been using a couple of Macs1 as my main systems for some time now. As many (if not most, these days) Web developers do, I run different systems (Windows and a raft of Linuxes) using VMWare Fusion so that I can do various testing activities.

Many Linux distributions come with some form of automation support for installation and updates2. Several of these use the (generally excellent) Red Hat Anaconda installer and its automation scheme, Kickstart. Red Hat and the user community offer copious, free documentation to help the new-to-automation system administrator get started.

If you're doing a relatively plain-vanilla installation, this is trivially easy. After installing a new Fedora system (image), for example, there is a file named anaconda-ks.cfg in the root user's home directory, which can be used to either replay the just-completed installation or as a basis for further customisation. To reuse, save the file to a USB key or to a suitable location on your local network, and you can replay the installation at will.

Customising the installation further, naturally, takes significant additional effort — almost as "significant" as the effort required to do the same level of customisation manually during installation. The saving grace, of course, is that this only needs to be done once for a given version of a given distribution. Some relatively minor tinkering will be needed to move from one version to another (say, Fedora 13 to 14), and an unknowable-in-advance amount of effort needed to adapt the Kickstart configuration to a new, different distribution (such as Mandriva), since packages on offer as well as package names themselves can vary between distros3.

It's almost enough to make me pull both remaining hairs out. For several years, I have had a manual installation procedure for Fedora documented on my internal Wiki. That process, however, leaves something to be desired, mainly because it is an intermixture of manual steps and small shell scripts that install and configure various bits and pieces. Having a fire-and-forget system like Kickstart (that could then be adapted to other distributions as well), is an extremely seductive idea.

It doesn't help that the Kickstart Configurator on Fedora 13, which provides a (relatively) easy-to-use GUI for configuring and specifying the contents of a Kickstart configuration file, works inconsistently. Using the default GNOME desktop environment, one of my two Fedora VMs fails to display the application menu, which is used for tasks such as loading and saving configuration files. Under the alternate (for Fedora) KDE desktop, the menu appears and functions correctly.

One of the things I might get around to eventually is to write an alternative automated-installation-configuration utility. Being able to install a common set of packages across RPM-based (Red Hat-style) Linuxes such as Fedora, Mandriva and CentOS as well as Debian and its derivatives (like Ubuntu), and maybe one or two others besides, would be a Very Handy Thing to Have™.

That is, once I scavenge enough pet-project time to do it, of course. For now, it's back to the nuances of Kickstart.

Footnotes:

1. an iMac and a MacBook Pro. (Return)

2. Microsoft offer this for Windows, of course, but they only support Vista SP1, Windows 7 and Windows Server 2008. No XP. Oops. (Return)

3. Jargon. The term distro is commonly used by Linux professionals and enthusiasts as an abbreviation for "distribution"; a collection of GNU and other tools and applications built on top of the Linux kernel. (Return)

Wednesday, 28 July 2010

Automating So You Don't Forget

This is a bit of an introductory-level post/rant/tutorial, but I've been peppered by enough "why on earth would you do this?" questions by various (seemingly experienced) project team members and on various mailing lists that I thought I'd just write my own take on this and point people to it when useful.

I'm pulling my (semihemidemiexistent) hair out on four different PHP projects at the moment. Not because they take all my time (they don't, unfortunately), but because three of them are in "maintenance" mode and express that in different ways. Take version control: one uses Mercurial (my favourite DVCS package); Subversion (once "subversive;" now the "safe" non-DVCS choice); and (tragically) git. Each project has different coding standards. One of those is actually widely-enough used that PHP CodeSniffer comes with support for it right in the tin. The others are relatively easy to code "sniffs" for. (Do remember that none of the three "maintenance" projects were using CodeSniffer (or equivalent), and all three have very sporadic use of their main VCS repositories.)

Wait a minute...now I've got to remember which standards go with which projects? And oh, yeah, it would be Really Nice™ to have any changes automatically saved in version control... if they're worthy.

What do I mean by "worthy?" Well, before I worry overmuch about how code is formatted, I should be able to prove that it works properly. After all, the most beautifully-formatted code that doesn't work is still (essentially) useless. This, of course, is where a tool like PHPUnit comes in; once you have sufficient coverage of your code with automatable tests, especially if you write the tests before you write (new) code, you can make changes confidently and quickly, because a) your tests prove that the code works as expected, and b) you're making sensible use of a (D)VCS, so that when your wonderful new code goes south and doesn't come back, you can follow your virtual-breadcrumb trail back up the face of the cliff. Only after PHPUnit blesses the code should CodeSniffer get a crack at it.

The new folks are scribbling away: "first test everything, then comply with standards, and then update version control." The rest of you are saying "hang on a minute; that problem's been sorted any of several different ways."

Precisely. If you're developing in the Java world, you're spoilt for choice: you can do perfectly reasonable build/test/deploy automation using Ant, or if you want to keep a large number of people (allegedly) gainfully employed managing a J2EE-on-steroids project, you can go for Maven.

In the PHP world, we've got a nice "little" analogue to Ant called Phing. It will quickly become "dead-finger" technology; you'll wonder how (or why) you ever did a reasonably "serious" project without it. And yet, most of the open-source PHP projects I've seen (on Sourceforge and elsewhere) don't use such a tool; they rely on error-prone, manual steps. This manual process, with steps easily forgotten or mangled, is the source of many bugs in released software — in any language.

Enter Phing (or equivalent). You set up the moral equivalent of a makefile with the steps you want to have performed the same way in the same order, every time. Phing supports properties, which can be stored separately from the "master" build file that references them. This allows you to set up consistent process and policy (defined in the build file) and plug in the values for a specific project using the separate properties file.

So how much difference does all this make? Let's take an example set of steps, some variation of which I follow in my build files:

First, clean out all the files created by steps that come later (like test reports);
Then, run unit tests, displaying the output as they run. If tests fail, stop;
Verify compliance with your chosen coding standards; if a problem is found, stop. Either fix the problem if it's in a file you've touched or add the file to the ignore list if it's a legacy file;
I like to run PHPDocumentor to automatically generate developer documentation, from comments left in the code. CodeSniffer will check these, too, so by the time phpdoc gets its grubby virtual paws on your code, it shouldn't find any problems;
If all is well, then it's on to version control. I have Phing show a "diff" report of what's changed since the last checkin, and then prompt me for a checkin comment. If I want to run the whole process but not check in to VCS (maybe I'm coming back to a project after a while away and just want to see the earlier steps run), I can hit the Return key, and my build file will skip the VCS checkin because I've supplied an empty comment (which it checks for).

Great, so (since I've followed a few conventions), all I need to do is type ''phing'' at the command line and it's off to the races. Trivially easy to use and, much more importantly, proof against a very high level of idiocy.

What's that? You in the back... I'm putting the cart before the horse, you say? I shouldn't do a process that drives VCS checkin, but a VCS checkin "hook" that does the validation and so on instead?

To some degree, that's a matter of taste. From a very practical perspective, though, having your build-and-test automation drive VCS instead of the other way 'round means that you can use any VCS operable from a command line, with minimal pain moving between projects. Not every VCS implements a pre-commit hook in the same way; some apparently don't implement them at all. (Yes, we know they're toys, but they're "enterprisey" big-ticket toys. Some managers will buy anything.) So, by having a single-command process execution/enforcement tool, you'll generally find that the internal and external quality of your project improves considerably and quickly; you'll also find that the risk involved with sweeping changes or audacious new features drops to a more comfortably survivable level.

And that's why I always answer the question "What tools should I be using for my PHP development?" to include at least:

Your project's version control tool of choice (again, I recommend Mercurial);
Phing;
PHPUnit;
PHP CodeSniffer; and
PHPDocumentor.

Once we get people used to a core set of tools and practices, we can then go on to the thorny religious issues like, "which PHP framework should I use?"

Next question?

Saturday, 8 May 2010

She's Putting Me Through Changes...

...they're even likely to turn out to be good ones.

As you may recall, I've been using and recommending the Kohana PHP application framework for some time. Kohana now offer two versions of their framework:

the 2.x series is an MVC framework, with the upcoming 2.4 release to be the last in that series; and
the 3.0 series, which is an HMVC framework.

Until quite recently, the difference between the two has been positioned as largely structural/philosophical; if you wished to develop with the 'traditional' model-view-controller architecture, then 2.x (currently 2.3.4) is what you're after; with great documentation and tutorials, any reasonably decent PHP developer should be able to get Real Work™ done quickly and efficiently. Oh the other hand, the 3.0 (now 3.0.4.2) offering is a hierarchical MVC framework. While HMVC via 3.0 offers some tantalising capabilities, especially in large-scale or extended sequential development, there remains an enthusiastic, solid community built around the 2.3 releases.

One of the long-time problems with 2.3 has been how to do unit testing? Although vestigial support for both a home-grown testing system and the standard PHPUnit framework exists in the 2.3 code, neither is officially documented or supported. What this leads to is a separation between non-UI classes, which are mocked appropriately and tested from the 'traditional' PHPUnit command line, and UI testing using tools like FitNesse. This encourages the developer to create as thin a UI layer as practical over the standalone (and more readily testable) PHP classes which that UI layer makes use of. While this is (generally) a desirable development pattern, encouraging and enabling wider reuse of the underlying components, it's quite a chore to get an automated testing/CI rig built around this.

But when I came across a couple of pages like this one on LinkedIn (free membership required). This thread started out asking how to integrate PHPUnit with Kohana 2.3.4, and then described moving to 3.0 as

I grabbed Kohana 3, plugged in PHPUnit, tested it, works a treat! So we're biting the bullet and moving to K3! :)

I've done a half-dozen sites in Kohana 2.3, as I'd alluded to earlier. I've just downloaded KO3 and started poking at it, with the expectation to move my own site over shortly and, in all probability, moving 3.0 to the top of my "recommended tools" list for PHP.

Like the original poster, Mark Rowntree, I would be interested to know if and how anybody got PHPUnit working properly in 2.3.4.

Thanks for reading.

Tuesday, 27 April 2010

Let's Do The Time Warp Agai-i-i-i-n!! (Please, $DEITY, no...)

For those who may somehow not be aware of it, LinkedIn is a (generally quite good) professionally-oriented social-networking site. This is not Facebook, fortunately. It's not geared towards teenagers raving about the latest corporate boy band du jour. It often can be, however, a great place to network with people from a variety of vocational, industry and/or functional backgrounds to get in contact with people, share information, and so on.

One of the essential features of LinkedIn is its groups, which are primarily used for discussions and job postings. In the venerable Usenet tradition, these discussions can have varying levels of insightful back-and-forth, or they can degenerate into a high-fidelity emulation of the "Animal House" food fight. As with Usenet, they can often give the appearance of doing both at the same time. Unlike Usenet, one has to be a member of LinkedIn to participate.

One of the (several) groups I follow is LinkedPHPers, which bills itself as "The Largest PHP Group" on LinkedIn. Discussions generally fall into at least one of a very few categories:

How do I write code to solve "this" problem? (the 'professional' version of "Help me do my homework");
What do people know/think about "this" practice or concept?
I'm looking for work, or people to do work; does anybody have any leads?

As veterans of this sort of discussion would expect, the second type of discussion can lead to long and passionate exchanges with varying levels of useful content (what became known on Usenet as a "flame war.") The likelihood of such devolution seems to be inversely proportional to its specificity and proportionally to the degree which the concept in question is disregarded/unfamiliar/unknown to those with an arguable grasp of their Craft.

It should thus be no surprise that a discussion on the LinkedPHPers group of "Procedural vs Object Oriented PHP Programming" would start a flame war for both of the above reasons. With 58 responses over the past month as I write this, there are informational gems of crystal clarity buried in the thick, gruesome muck of proud ignorance. As Abraham Lincoln is reported to have said, "Better to remain silent and be thought a fool than to speak out and remove all doubt."

What's my beef here? Simply that this discussion thread is re-fighting a war that was fought and settled over a quarter-century ago by programming in general. The reality is that any language that has a reasonable implementation of OOP (with encapsulation/access control, polymorphism and inheritance, in that order by my reckoning) should be used in that way.

Several of the posts trot out the old canard about a performance 'penalty' when using OOP. In practice, that's true of only the sharpest edge cases – simple, tiny, standalone classes that should never have been developed that way because they don't provide a useful abstraction of a concept within the solution space, generally by developers who are not professionally knowledgeable of the concepts involved and quite often by those copying and pasting code they don't understand into their own projects (which they also don't understand). That bunch sharply limited the potential evolution and adoption of C++ in the '80s and '90s, and many of their ideological brethren have made their home in Web development using PHP.

Yes, I know that "real" OOP in PHP is a set of tacked-on features, late to the party; first seriously attempted in PHP 4, with successively evolving implementations in 5.0, 5.2 and 5.3, with the semi-mythological future PHP 6 adding many new features. I know that some language features are horribly unwieldy (which is why I won't use PHP namespaces in my own code; proven idea, poor implementation). But taken as a whole, it's increasingly hard to take the Other Side ("we don' need no steeeenkin' objects") at all seriously.

The main argument for ignoring the "ignore OOP" crowd is simply this: competent, thoughtful design using OOP gives you the ability to know and prove that your code works as expected, and data is accessed or modified only in the places and ways that are intended. OOP makes "software-as-building-blocks" practical, a term that first gained currency with the Simula language in the mid-1960s. OOP enables modern software proto-engineering practices such as iterative development, continuous integration and other "best practices" that have been proven in the field to increase quality and decrease risk, cost and complexity.

The 'ignore OOP in PHP' crowd like to point to popular software that was done in a non-OOP style, such as Drupal, a popular open-source Web CMS. But Drupal is a very mature project, by PHP standards; the open-source project seems to have originated in mid-2000, and it was apparently derived from code written for a project earlier still. So the Drupal code significantly predates PHP 5, if not PHP 4 (remember, the first real whack at OOP in PHP). Perusing the Drupal sources reveals an architecture initially developed by some highly experienced structured-programming developers (a precursor discipline to OOP); their code essentially builds a series of objects by convention, not depending on support in the underlying language. It is a wonder as it stands – but I would bet heavily that the original development team, if tasked with re-implementing a Web CMS in PHP from a blank screen, would use modern OO principles and the underlying language features which support them.

And why would such "underlying language features" exist and evolve, especially in an open-source project like PHP, if there was not a real, demonstrable need for them? Saying you're not going to do OOP when using PHP is metaphorically akin to saying you intend to win a Formula One race without using any gear higher than second in the race.

Good luck with that. You might want to take a good, hard look at what your (more successful) colleagues are doing, adopt what works, and help innovate your Craft further. If you don't, you'll continue to be a drag on progress, a dilettante intent upon somehow using a buggy whip to accelerate your car.

It doesn't work that way anymore.

Wednesday, 14 April 2010

Process: Still 'garbage in, garbage out,', but...

...you can protect yourself and your team. Even if we're talking about topics that everybody's rehashed since the Pleistocene (or at least since the UNIVAC I).

Traditional, command-and-control, bureaucratic/structured/waterfall development process managed to get (quite?) a few things right (especially given the circumstances). One of these was code review.

Done right, a formal code review process can help the team improve a software project more quickly and effectively than ad-hoc "exploration and discovery" by individual team members. Many projects, including essentially all continuing open-source projects that I've seen, use review as a tool to (among other things) help new teammates get up to speed with the project. While it can certainly be argued that pair programming provides a more effective means to that particular end, they (and honestly, most agile processes) tend to focus on the immediate, detail-level view of a project. Good reviews (including but not limited to group code reviews) can identify and evaluate issues that are not as visibly obvious "down on the ground." (Cédric Beust, of TestNG and Android fame, has a nice discussion on his blog about why code reviews are good for you.

Done wrong, and 'wrong' here often means "as a means of control by non-technical managers, either so that they can honour arbitrary standards in the breach or so that they can call out and publicly humiliate selected victims," code reviews are nearly Pure Evil™, good mostly for causing incalculable harm and driving sentient developers in search of more humane tools – which tend (nowadays) to be identified with agile development. Many individuals prominent in developer punditry regularly badmouth reviews altogether, declaring that if you adopt the currently-trendy process, you won't ever have to do those eeeeeeeeevil code reviews ever again. Honest. Well, unless.... (check the fine print carefully, friends!)

Which brings us to the point of why I'm bloviating today:

Code reviews, done right, are quite useful;
Traditional, "camp-out-in-the-conference-room" code reviews are impractical in today's distributed, virtual-team environment (as well as being spectacularly inefficient), and
That latter problem has been sorted, in several different ways.

This topic came up after some tortuous spelunking following an essentially unrelated tweet, eventually leading me to Marc Hedlund's Code Review Redux... post on O'Reilly Radar (and then to his earlier review of Review Board and to numerous other similar projects.

The thinking goes something like, Hey, we've got all these "dashboards" for CRM, ERP, LSMFT and the like; why not build a workflow around one that's actually useful to project teams. And these tools fit the bill – helping teams integrate a managed approach to (any of several different flavours of) code review into their development workflow. This generally gets placed either immediately before or immediately after a new, or newly-modified, project artifact is checked into the project's SCM. Many people, including Beust in the link above, prefer to review code after it's been checked in; others, including me, prefer reviews to take place before checkin, so as to not risk breaking any builds that pull directly from the SCM.

We've been using collaborative tools like Wikis for enough years now that any self-respecting project has one. They've proven very useful for capturing and organising collective knowledge, but they are not at their best for tracking changes to external resources, like files in an SCM. (Trac mostly finesses this, by blurring the lines between a wiki, an SCM and an issue tracker.) So, a consensus seems to be forming, across several different projects, that argues for

a "review dashboard," showing a drillable snapshot of the project's code, including completed, in-process and pending reviews;
a discussion system, supporting topics related to individual reviews, groups of reviews based on topics such as features, or the project as a whole; these discussions can be searched and referenced/linked to; and
integration support for widely-used SCM and issue-tracking systems like Subversion and Mantis.

Effective use of such a tool, whatever your process, will help you create better software by tying reviews into the collaborative process. The Web-based versions in particular remove physical location as a condition for review. Having such a tool that works together with your existing (you do have these, yes?) source-code management and issue-tracking systems makes it much harder to have code in an unknown, unknowable state in your project. In an agile development group, this will be one of the first places you look for insight into the cause of problems discovered during automated build or testing, along with your SCM history.

And if you're in a shop that doesn't use these processes, why not?

On a personal note, this represents my return to blogging after far, far too long buried under Other Stuff™. The spectacularly imminent crises are now (mostly, hopefully) put out of our misery now; you should see me posting more regularly here for a while. As always, your comments are most welcome; this should be a discussion, not a broadcast!