Tuesday 27 April 2010

Let's Do The Time Warp Agai-i-i-i-n!! (Please, $DEITY, no...)

For those who may somehow not be aware of it, LinkedIn is a (generally quite good) professionally-oriented social-networking site. This is not Facebook, fortunately. It's not geared towards teenagers raving about the latest corporate boy band du jour. It often can be, however, a great place to network with people from a variety of vocational, industry and/or functional backgrounds to get in contact with people, share information, and so on.

One of the essential features of LinkedIn is its groups, which are primarily used for discussions and job postings. In the venerable Usenet tradition, these discussions can have varying levels of insightful back-and-forth, or they can degenerate into a high-fidelity emulation of the "Animal House" food fight. As with Usenet, they can often give the appearance of doing both at the same time. Unlike Usenet, one has to be a member of LinkedIn to participate.

One of the (several) groups I follow is LinkedPHPers, which bills itself as "The Largest PHP Group" on LinkedIn. Discussions generally fall into at least one of a very few categories:

  • How do I write code to solve "this" problem? (the 'professional' version of "Help me do my homework");

  • What do people know/think about "this" practice or concept?

  • I'm looking for work, or people to do work; does anybody have any leads?

As veterans of this sort of discussion would expect, the second type of discussion can lead to long and passionate exchanges with varying levels of useful content (what became known on Usenet as a "flame war.") The likelihood of such devolution seems to be inversely proportional to its specificity and proportionally to the degree which the concept in question is disregarded/unfamiliar/unknown to those with an arguable grasp of their Craft.

It should thus be no surprise that a discussion on the LinkedPHPers group of "Procedural vs Object Oriented PHP Programming" would start a flame war for both of the above reasons. With 58 responses over the past month as I write this, there are informational gems of crystal clarity buried in the thick, gruesome muck of proud ignorance. As Abraham Lincoln is reported to have said, "Better to remain silent and be thought a fool than to speak out and remove all doubt."

What's my beef here? Simply that this discussion thread is re-fighting a war that was fought and settled over a quarter-century ago by programming in general. The reality is that any language that has a reasonable implementation of OOP (with encapsulation/access control, polymorphism and inheritance, in that order by my reckoning) should be used in that way.

Several of the posts trot out the old canard about a performance 'penalty' when using OOP. In practice, that's true of only the sharpest edge cases – simple, tiny, standalone classes that should never have been developed that way because they don't provide a useful abstraction of a concept within the solution space, generally by developers who are not professionally knowledgeable of the concepts involved and quite often by those copying and pasting code they don't understand into their own projects (which they also don't understand). That bunch sharply limited the potential evolution and adoption of C++ in the '80s and '90s, and many of their ideological brethren have made their home in Web development using PHP.

Yes, I know that "real" OOP in PHP is a set of tacked-on features, late to the party; first seriously attempted in PHP 4, with successively evolving implementations in 5.0, 5.2 and 5.3, with the semi-mythological future PHP 6 adding many new features. I know that some language features are horribly unwieldy (which is why I won't use PHP namespaces in my own code; proven idea, poor implementation). But taken as a whole, it's increasingly hard to take the Other Side ("we don' need no steeeenkin' objects") at all seriously.

The main argument for ignoring the "ignore OOP" crowd is simply this: competent, thoughtful design using OOP gives you the ability to know and prove that your code works as expected, and data is accessed or modified only in the places and ways that are intended. OOP makes "software-as-building-blocks" practical, a term that first gained currency with the Simula language in the mid-1960s. OOP enables modern software proto-engineering practices such as iterative development, continuous integration and other "best practices" that have been proven in the field to increase quality and decrease risk, cost and complexity.

The 'ignore OOP in PHP' crowd like to point to popular software that was done in a non-OOP style, such as Drupal, a popular open-source Web CMS. But Drupal is a very mature project, by PHP standards; the open-source project seems to have originated in mid-2000, and it was apparently derived from code written for a project earlier still. So the Drupal code significantly predates PHP 5, if not PHP 4 (remember, the first real whack at OOP in PHP). Perusing the Drupal sources reveals an architecture initially developed by some highly experienced structured-programming developers (a precursor discipline to OOP); their code essentially builds a series of objects by convention, not depending on support in the underlying language. It is a wonder as it stands – but I would bet heavily that the original development team, if tasked with re-implementing a Web CMS in PHP from a blank screen, would use modern OO principles and the underlying language features which support them.

And why would such "underlying language features" exist and evolve, especially in an open-source project like PHP, if there was not a real, demonstrable need for them? Saying you're not going to do OOP when using PHP is metaphorically akin to saying you intend to win a Formula One race without using any gear higher than second in the race.

Good luck with that. You might want to take a good, hard look at what your (more successful) colleagues are doing, adopt what works, and help innovate your Craft further. If you don't, you'll continue to be a drag on progress, a dilettante intent upon somehow using a buggy whip to accelerate your car.

It doesn't work that way anymore.

Friday 16 April 2010

A Slight Detour: Musing on Open Data Standards as applied to Social Entrepreneurship and Philanthropy

This started out as a conversation on Twitter with @cdegger, @ehrenfoss, @p2173 and other folks following the #opendata, #socent or #10swf hash tags. Twitter is (in)famous for being limited to 140 characters per “tweet”; with the extra hash tags and all, that's reduced to 96. I wrote a reply and used a text editor to break it into "tweets"; by the time I got to “(part 8/nn),” I knew it was crazy to try and tweet an intelligible response.

So, folks, here's what I think; I hope it's more intelligible this way. Comments appreciated, here or on Twitter.


What I see #opendata doing for #socent is to allow individuals or groups to build on and share information on opportunities, needs, donors, etc.. This collaboration would use open data formats and tools that iteratively improve philanthropy effectiveness.

Think of how a wiki enables text collaboration, shared learning and discovery, or how instant messaging allows both realtime and time-shifted conversation. Now expand that idea to a sort of "social database" that can be run like a less elitist Wikipedia mated with an RSS feed. Anybody can browse or search the information in the database (needs and offers). They can also get their own local copies of some/all data and have it be updated from the "upstream" source automatically. A smaller group of vetted people can update the "master" data which then gets pushed out to all viewers or subscribers.

This curation is needed to maintain data integrity and to (hopefully) eliminate attacks on or disruptions to the "social database" itself. The sad reality is that any public information on the Internet must have some sort of protection, or it will be damaged or destroyed. I see this as being especially true of a pioneering social-entrepreneurial system like this; there are enough people out there who have a vested interest in this sort of thing not working that security (authentication and validation) must be built in from the start. Otherwise, we will wind up with a situation akin to "spam" and "phishing" with email. Email standards were set in the early days, when the Internet was a primarily academic/scientific resource where all users could legitimately trust each other by default; the current state of the Net is far different. Any open data standards and protocols developed for the "social database" must take this into account.

These open data and protocol standards should be designed with the understanding that they are likely to change over time as the needs of users become better defined and as new opportunities to support those needs present themselves. The first version of a new system (like this) is almost never the simplest, nor will it be the most effective for its purpose. Lessons will be learned that should be folded back into revisions of the standards, in much the same way that later versions of standards like HTML built upon experience gained with earlier versions.

When evolving these data formats and protocols, it is vital that the process be fully transparent, with a balance between building progress and listening to the needs and concerns of the widest possible audience. It is entirely possible that no one standard in a given area will suit all stakeholders. In those instances, some sort of federation built on interchange of some common subset or intermediate format may be helpful. This should be seen as undesirable, however, as it limits the ability of casual or new users to make effective use of the entire distributed system.

The development, maintenance and ownership of standards developed for this project (including necessary legal protection such as copyrights) must be under the auspices of an organization with the visibility and stature to maintain control of the standards, lest they devolve into a balkanized mess that would be as unhelpful to the intended mission as not having any such standards at all. I would expect this organization to be a non-profit organization. Not only will this remove the fiduciary responsibility for monetizing investments made in the technology from the officers of the organization, but other non-profits/NGOs can be expected to cooperate more fully with the parent organization in developing, deploying and maintaining the standards – allowing them to remain open and unencumbered.

Finally, I think it's important to state that I don't see any one type of format as necessarily superior for developing this. I'm aware that there has been a lot of work done with various XML-based systems as part of the #socent build-out to date. After working with XML for approximately ten years now, I have seen some magnificent things done with it, and some absolutely misbegotten things done with it. Particularly with regards to the authentication and validation issues I mentioned earlier, and also with the sheer bulk and relative inefficiency of a large-scale XML data store, there are several issues I can think of. They're solvable, and they're by no means unique to XML, but they are issues that need to be thought about.

EDIT Sunday 18 April: I feel really strongly that one of the things our (distributed) information architecture is going to have to nail from the very beginning is the idea of authentication/verification; does a particular bit of information (what I'd been calling "opportunities, needs, [and] donors" earlier), otherwise we're just laying ourselves open to various black-hat cracking attacks as well as scams, for instance of the "Nigerian 419" variety. I think it's pretty obvious we're going to need some sort of vetting for posters and participants this in turn implies some (loose or otherwise) organization with a necessary minimum amount of administrative/curative overhead to maintain public credibility and apparent control over our own resources. Anybody would be allowed to read basic information, but I think we can all agree on the need for some sort of access control and/or obfuscation of data like individual contact information or some types of legal/financial information that gets tacked into the "social database." This could be pretty straightforward. One hypothetical approach might be as basic as having those who wish to publish information to go through a simple registration process that issues them some piece of private data (possibly even the open standard OAuth authentication that Twitter and others use for applications hooking into their infrastructure). Either alternatively or in conjunction, a public-key cryptography system such as GNU Privacy Guard could be used to prove data came from who it claimed to. For instance, the data to be published could be enclosed in a ZIP file or other archive, along with a "signature" and the identification of a registered publisher. (There's no way that I'm aware of to actually embed the 'signature' itself into the data file: the signature depends on the exact content of the data, and by adding the signature, the content is changed.)

To the non-technical user, the effects of such a system should be:

  • The 'Foundation' (for want of a better term) can use already-proven, open standards to enhance member and public confidence in the accuracy and transparency of the content of the "social database". Attempting to "reinvent the wheel" in this area is an oft-proven Bad Idea™;

  • The Foundation will be able to develop, deploy, manage and maintain a (potentially) widely-distributed philanthropic/social database architecture that can support a wide variety of organizational/use models;

  • Having this sort of authentication and validation will aid in the evolution of the technical and "business" architectures of the system; new services can be layered on top of existing ones by different users as needed.

For instance, if a particular member has announced that they will publish information in a specific version of the schema for the "social database" (say, during registration), any later information purportedly from that member published in an older format should raise warning flags, as it may be a sign of actual or attempted compromise in security and data integrity. A benign incident, such as the member inadvertently using a software tool that submits data in an inappropriate format, can be quickly identified, communicated and rectified.

This will be vital if we are to create a data system which publicly distributes data pertinent to specific members outside those members' control that could include information that must not be altered by outside influences (such as, say, budget information) or information that, for general Internet-security reasons, should not be directly visible to all and sundry (for instance, contact information might be accessible to members but not the casual browsing public).

Wednesday 14 April 2010

Process: Still 'garbage in, garbage out,', but...

...you can protect yourself and your team. Even if we're talking about topics that everybody's rehashed since the Pleistocene (or at least since the UNIVAC I).

Traditional, command-and-control, bureaucratic/structured/waterfall development process managed to get (quite?) a few things right (especially given the circumstances). One of these was code review.

Done right, a formal code review process can help the team improve a software project more quickly and effectively than ad-hoc "exploration and discovery" by individual team members. Many projects, including essentially all continuing open-source projects that I've seen, use review as a tool to (among other things) help new teammates get up to speed with the project. While it can certainly be argued that pair programming provides a more effective means to that particular end, they (and honestly, most agile processes) tend to focus on the immediate, detail-level view of a project. Good reviews (including but not limited to group code reviews) can identify and evaluate issues that are not as visibly obvious "down on the ground." (Cédric Beust, of TestNG and Android fame, has a nice discussion on his blog about why code reviews are good for you.

Done wrong, and 'wrong' here often means "as a means of control by non-technical managers, either so that they can honour arbitrary standards in the breach or so that they can call out and publicly humiliate selected victims," code reviews are nearly Pure Evil™, good mostly for causing incalculable harm and driving sentient developers in search of more humane tools – which tend (nowadays) to be identified with agile development. Many individuals prominent in developer punditry regularly badmouth reviews altogether, declaring that if you adopt the currently-trendy process, you won't ever have to do those eeeeeeeeevil code reviews ever again. Honest. Well, unless.... (check the fine print carefully, friends!)

Which brings us to the point of why I'm bloviating today:

  1. Code reviews, done right, are quite useful;

  2. Traditional, "camp-out-in-the-conference-room" code reviews are impractical in today's distributed, virtual-team environment (as well as being spectacularly inefficient), and

  3. That latter problem has been sorted, in several different ways.

This topic came up after some tortuous spelunking following an essentially unrelated tweet, eventually leading me to Marc Hedlund's Code Review Redux... post on O'Reilly Radar (and then to his earlier review of Review Board and to numerous other similar projects.

The thinking goes something like, Hey, we've got all these "dashboards" for CRM, ERP, LSMFT and the like; why not build a workflow around one that's actually useful to project teams. And these tools fit the bill – helping teams integrate a managed approach to (any of several different flavours of) code review into their development workflow. This generally gets placed either immediately before or immediately after a new, or newly-modified, project artifact is checked into the project's SCM. Many people, including Beust in the link above, prefer to review code after it's been checked in; others, including me, prefer reviews to take place before checkin, so as to not risk breaking any builds that pull directly from the SCM.

We've been using collaborative tools like Wikis for enough years now that any self-respecting project has one. They've proven very useful for capturing and organising collective knowledge, but they are not at their best for tracking changes to external resources, like files in an SCM. (Trac mostly finesses this, by blurring the lines between a wiki, an SCM and an issue tracker.) So, a consensus seems to be forming, across several different projects, that argues for

  • a "review dashboard," showing a drillable snapshot of the project's code, including completed, in-process and pending reviews;

  • a discussion system, supporting topics related to individual reviews, groups of reviews based on topics such as features, or the project as a whole; these discussions can be searched and referenced/linked to; and

  • integration support for widely-used SCM and issue-tracking systems like Subversion and Mantis.

Effective use of such a tool, whatever your process, will help you create better software by tying reviews into the collaborative process. The Web-based versions in particular remove physical location as a condition for review. Having such a tool that works together with your existing (you do have these, yes?) source-code management and issue-tracking systems makes it much harder to have code in an unknown, unknowable state in your project. In an agile development group, this will be one of the first places you look for insight into the cause of problems discovered during automated build or testing, along with your SCM history.

And if you're in a shop that doesn't use these processes, why not?


On a personal note, this represents my return to blogging after far, far too long buried under Other Stuff™. The spectacularly imminent crises are now (mostly, hopefully) put out of our misery now; you should see me posting more regularly here for a while. As always, your comments are most welcome; this should be a discussion, not a broadcast!