Wednesday, 30 June 2010

The Decline of the Internet, Part MMMDIC

Rant mode: on. Some effort will be made to keep ear-steam to a minimum.

Internet Relay Chat (better known as IRC), specifically Undernet, was one of the very first things I discovered on the (then-)text-mode-only Internet, Over the next 20 years or so, I've gone through varying levels of activity (including writing an IRC client for OS/2, the operating system that brought technology to a marketing fight (as in, "bringing a pocket knife to a firefight")./p>

One of the changes that happened on Usenet (and other networks) when the Net started getting used by larger hordes of people, with varying intentions, was to institute some form of registration. You'd visit a bare-bones Web site (such as the one for Undernet, fill in a few items on a form, and wait for an email to pop into your inbox stating that your ID was active (known to the bot managing authentication on the servers). I'd used the same "nick" (nickname), JeffD for at least as long as I or my logs can remember (certainly before 1998), and formalized this when the channels I frequented (chiefly #usa) started requiring it.

Now, I (like to think I) do have a life outside IRC and the Net in general; I've been known to go as long as a year without signing on in Undernet. Not a problem; things Just Worked™, in almost Mac-like fashion.

Until today. I fired up my current-favourite client program (Colloquy, highly recommended); it connected to a random Undernet server (in Budapest in this case), and sent the customary PRIVMSG (private message) to the bot that handles authentication, with my username and password. Instead of the usual "authentication accepted" message coming back, I see "X NOTICE I don't know who JeffD is."

Well, blowups happen. I go to the Undernet CService page I mentioned earlier; it's a whiz-bang series of 7 pages now. I go through it, enter my desired username, password, and an email address that I can be reached at (the same one I've had since text-mode days), and am finally presented with this marvel of clarity:

Congratulations! Something went wrong. Sorry.

I'd bet heavily that the reason for the breakage would, on investigation, turn out to be a case of unchecked assumptions no longer being valid, combined with a lack of human management. Nobody has the time for community-without-a-pricetag anymore, and that in a nutshell is what IRC is

And, two hours later, I still haven't received the expected email. Feh.

Rant mode: standby.

Friday, 18 June 2010

I Thought Standard Libraries Were Supposed to be Better...

...than hand coding. Either the PHP folks never got that memo, or I'm seriously misconceptualising here.

Case in point: I was reading through Somebody Else's Code™, and I saw a sequence of "hand-coded" assignments of an empty string to several array entries, similar to:

    $items[ 'key2' ] = '';
    $items[ 'key1' ] = '';
    $items[ 'key6' ] = '';
    $items[ 'key3' ] = '';
    $items[ 'key8' ] = '';
    $items[ 'key5' ] = '';
    $items[ 'key4' ] = '';
    $items[ 'key7' ] = '';

I thought, "hey, hang on; there's a function to do easy array merges in the standard library (array_merge); surely it'd be faster/easier/more reliable to just define a (quasi-)constant array and merge that in every time through the loop?"

Fortunately, I didn't take my assumption on blind faith; I wrote a quick little bit to test the hypothesis:


$count = 1e5;
$data = array(
        'key2' => '',
        'key1' => '',
        'key6' => '',
        'key3' => '',
        'key8' => '',
        'key5' => '',
        'key4' => '',
        'key7' => '',
        );
$realdata = array();

$start = microtime( true );
for ( $loop = 0; $loop < $count; $loop++ )
{
    $realdata = array_merge( $realdata, $data );
};
$elapsed = microtime( true ) - $start;
printf( "%ld iterations with array_merge took %7.5f seconds.\n", $count, $elapsed );

$start = microtime( true );
for ( $loop = 0; $loop < $count; $loop++ )
{
    $data[ 'key2' ] = '';
    $data[ 'key1' ] = '';
    $data[ 'key6' ] = '';
    $data[ 'key3' ] = '';
    $data[ 'key8' ] = '';
    $data[ 'key5' ] = '';
    $data[ 'key4' ] = '';
    $data[ 'key7' ] = '';
};
$elapsed = microtime( true ) - $start;
printf( "%ld iterations with direct assignment took %7.5f seconds.\n", $count, $elapsed );

I ran the tests on a nearly two-year-old iMac with a 3.06 GHz Intel Core 2 Duo processor, 4 GB of RAM, OS X 10.6.4 and PHP 5.3.1 (with Zend Engine 2.3.0). Your results may vary on different kit, but I would be surprised if the basic results were significantly re-proportioned. The median times from running this test program 20 times came out as:

Assignment process Time (seconds) for 100,000 iterations
array_merge0.41995
Hand assignment0.15569

So, the "obvious," "more readable" code runs nearly three times slower than the existing, potentially error-prone during maintenance, "hand assignment." Hang on, if we used numeric indexes on our array, we could use the array_fill function instead; how would that do?

Adding the code:

    $data2 = array();
    $data2[ 0 ] = '';
    $data2[ 1 ] = '';
    $data2[ 2 ] = '';
    $data2[ 3 ] = '';
    $data2[ 4 ] = '';
    $data2[ 5 ] = '';
    $data2[ 6 ] = '';
    $data2[ 7 ] = '';
$start = microtime( true );
for ( $loop = 0; $loop < $count; $loop++ )
{
    $data2 = array_fill( 0, 8, '' );
};
$elapsed = microtime( true ) - $start;
printf( "%ld iterations with array_fill took %7.5f seconds.\n", $count, $elapsed );

produced a median time of 0.21475 seconds, or some 37.9% slower than the original hand-coding.

For folks coming from other, compiled languages, such as C, C++, Ada or what-have-you, this makes no sense whatsoever; those languages have standard libraries that are not only intended to produce efficiently-maintainable code, but (given reasonably mature libraries) efficiently-executing code as well. PHP, at least in this instance, is completely counterintuitive (read: counterproductive): if you're in a loop that will be executed an arbitrary (and arbitrarily large) number of times, as the original code was intended to be, you're encouraged to write code that invites typos, omissions and other errors creeping in during maintenance. That's a pretty damning indictment for a language that's supposedly at its fifth major revision.

If anybody knows a better way of attacking this, I'd love to read about it in the comments, by email or IM.

Tuesday, 1 June 2010

The Sun Has Set on a Great Brand

Just a quick note pointing out, once again, that when many of us foresaw gloom and doom from the Oracle acquisition of Sun Microsystems....we were being hippie-pie-in-the-sky optimists.

Item: A couple of days ago, I tried to log into my Sun Developer Network account, so I could grab the latest "official" Java toolset for a Linux VM I was building. I couldn't log in; neither the password that was saved in my password manager nor any other password I could think of would work. I then went through the recover-lost-password form, expecting a reset email to pop into my inbox within a minute or two. An hour later, I had to go do other things. Yesterday, no email.

Finally, today (Tuesday morning), this gem appears in my inbox:

Date: Sun, 30 May 2010 10:07:20 -0700 (PDT)
From: login-autoreply@sun.com
To: jdickey@seven-sigma.com
Message-ID: <12602411.48691.1275239240752.JavaMail.daemon@mailhost>
Subject: Your Sun Online Account Information
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Source-IP: rcsinet15.oracle.com [148.87.113.117]
X-CT-RefId: str=0001.0A090205.4C029B48.0062,ss=1,fgs=0
X-Auth-Type: Internal IP

Thank you for contacting Sun Online Account Support.

We're sorry but a user name for your account could not be determined.

We would be happy to look into this further for you.  Please contact us at @EMAIL_FEEDBACK_CONTACT@.  We apologize for any inconvenience.

Thank you,
Sun Online Account Support

Oracle isn't even bothering to fill in (no doubt database-fed) form-mail fields to keep in touch with current and possible future Sun-originated customers, and the login-autoreply address is no doubt a black hole slaved to /dev/null. If they had sent an HTML email with 48-point red F*** YOU!!! as the entire content, the message received could not have been more clear: you don't matter, go away, throw your money, time and attention at somebody else.

I plan to do precisely that, and make clear to all and sundry precisely why.

Wednesday, 26 May 2010

Beating Your Head Against the Wall, Redux

...or, the "Monty Python and the Holy Grail" monks' guide to making your Mac desktop work like a server, instead of going and getting a copy of OS X Server like you should...

Mac OS X Server brings OS X simplicity and Unix power to a range of hardware systems. Most of the things that Server makes trivially simple can be done in OS X Desktop. Some of them, however, require the patience of Job and the ingenuity and tenaciousness of MacGyver...or so they at first appear.

One such task is installing Joomla! This is a nice little Web CMS which has some nice features for users, developers and administrators. On most Unix-like systems, or even recent versions of Microsoft Windows, installation is a very straightforward process for any system which meets the basic requirements (Web server, PHP, MySQL, etc.) as documented in the (PDF) Installation Manual or the PDF Quick Start guide. On most systems, it takes only a few minutes to breeze through from completed download to logging into the newly installed CMS.

The OS X desktop, as I said, is a bit different. This isn't a case of Apple's famous "Think Different" campaign so much as it appears to be a philosophical conflict between Apple's famous ease-of-use as applied to user and rights management, coming up against traditional Unix user rights management. Rather than the former merely providing a polished "front end" interface for the latter, some serious mind-games and subversion are involved. And I'm not talking about the well-known version control software.

When things go wrong with regard to a simple installation process of a well-known piece of software, usually Google is Your Best Friend. If you search for joomla mac install, however, you quickly notice that most of the hits talking about OS X desktop recommend that you install a second Apache, MySQL and PHP stack in addition to the one that's already installed in the system — packages such as XAMPP, MAMP and Bitnami. While these packages each appear to do just what it says on their respective tins, I don't like having duplicate distributions of the same software (e.g., MySQL) on the same system.

Experience and observation have shown that that's a train wreck just begging to happen. Why? Let's say I've got the "Joe-Bob Briggs Hyper-Extended FooBar Server" installed on my system. (System? Mac, PC, Linux; doesn't matter for this discussion.) When FooBar bring out a new release of their server (called the 'upstream' package), Joe-Bob has to get a copy of that, figure out what's changed from the old one, and (try to) adapt his Hyper-Extended Server to the new version. He then releases his Hyper-Extended version, quite likely some time behind the "official" release. "What about betas," you ask? Sure, Joe-Bob should have been staying on top of the pre-release release cycle for his upstream product, and may well have had quite a lot of input into it. But he can't really release his production Hyper-Extended Server until the "official" release of the upstream server. Any software project is subject to last-minute changes and newly-discovered "show-stopper" issues; MySQL 5.0 underwent 90 different releases. That's a lot for anybody to keep up with, and the farther away you get from that source of change (via repackaging, for example), the harder it is to manage your use of the software, taking into account features, security and the like.

So I long ago established that I don't want that sort of duplicative effort for mainstream software on modern operating systems. (Microsoft Windows, which doesn't have most of this software on its default install, is obviously a different problem, but we're not going there tonight.) It's too easy to have problems with conflicts, like failing to completely disable the "default" version of a duplicated system, to inject that kind of complexity into a system needlessly.

That isn't to say that it doesn't become very attractive sometimes. Even on a Mac OS X desktop — where "everything" famously "just works" — doing things differently than the "default" way can lead to great initial complexity in the name of avoiding complexity down the line. (Shades of "having to destroy the village in order to save it.")

The installation of Joomla! went very much like the (PDF) Installation Manual said it should... until you get to the screen that asks for local FTP credentials that give access to the Joomla! installation directory. It would appear that setting up a sharing-only user account on the system should suffice, and in fact several procedures documenting this for earlier versions of Mac OS X describe doing just that. One necessary detail appears different under 10.6.2, however: the "Accounts" item in System Preferences no longer allows the specification of user-specific command shells...or, if it does, it's very well hidden.

Instead, I created a new regular, non-administrative user for Joomla! I then removed the new user's home directory (under /Users) and created a symlink to the Joomla! installation directory.

Also, one difference between several of the "duplicating" xAMP systems I mentioned above and the standard Apache Web server as on OS X (and Linux) is that in the default system, access to served directories is disabled by default; the idea is that you define a new Apache <Directory> directive for each directory/application you install. Failing to do this properly and completely will result in Apache 403 ("Forbidden") errors. Depending on your attitude to security, you may either continue to do this, or change the default httpd.conf setting to Allow from All and use .htaccess files to lock down specific directories.

Once you have the underlying requirements set up (and FTP access is the only real think-outside-the-box issue, really), Joomla! should install easily. But if you're in a hurry and just trying to go through the documented procedures, you're quite likely to spend considerable time wondering why things don't Just Work.

And no, neither Fedora nor Ubuntu Linux "do the right thing" out-of-the-box either. At least, not im my tests.

Monday, 17 May 2010

Those were the days, my friend...but they're OVER.

No, I'm not talking about Facebook and the idea that people should have some control over how their own information makes money for people they never imagined. That's another post. This is a shout out to the software folk out there, and the wannabe software folk, who wonder why nobody outside their own self-selecting circles seems to get excited about software anymore.

Back in the early days of personal computers, from 1977 to roughly 2000, hardware capabilities imposed a real upper limit on software performance and capabilities. More and more people bought more and more new computers so they could do more and more stuff that simply wasn't practical on the earlier models. Likewise, whenever any software package managed something revolutionary, or did something in a new way that goosed performance beyond what any competitor could deliver, that got attention...until the next generation of hardware meant that even middle-of-the-road software was "better" (by whatever measure) than the best of what could happen before.

After about 2000, the speed curve that the CPUs had been climbing flattened out quite a lot. CPU vendors like Intel and AMD changed the competition to more efficient designs and multi-core processors. Operating systems – Linux, Mac OS X and even Microsoft Windows – found ways to at least begin to take advantage of the new architectures by farming tasks out to different cores. Development tools went through changes as well; new or "rediscovered" languages like Python and Forth that could handle multiprocessing (usually by multithreading) or parallel processing became popular, and "legacy" languages like C++ and even COBOL now have multithreading libraries and/or extensions.

With a new (or at least revamped) set of tools in the toolbox, we software developers were ready to go forth and deliver great new multi-everything applications. Except, as we found out, there was one stubborn problem.

Most people (and firms) involved in software application development don't have the foggiest idea how to do multithreading efficiently, and even fewer can really wrap their heads around parallel processing. So, to draw an analogy, we have these nice, shiny new Porsche 997 GT3s being driven all over, but very few people know how to get the vehicle out of first gear.

I started thinking about this earlier this evening, as I read an article on Lloyd Chambers' Mac Performance Guide for Digital Photographers & Performance Addicts (pointed to in turn by Jason O'Grady and David Morgenstern's Speedtest article on ZDNet's Apple blog. One bit in Chambers' article particularly caught my eye:

With Photoshop CS4/CS5, there is also increased overhead with 8 cores due to CS5 implementation weakness; it’s just not very smart about knowing how many threads are useful, so it wastes time and memory allocating too many threads for tasks that won’t even benefit from them.

In case you've been stuck in Microsoft Office for Windows for the last 15 years, I'll translate for you: The crown-jewels application for Adobe, one of the oldest and largest non-mainframe software companies in history, has an architecture on its largest-selling target hardware platform that is either too primitive or defective to make efficient use of that platform. This is on a newly released version of a very mature product, that probably sells a similar proportion of Macs to the proportion of Windows PCs justified by Microsoft Office (which is even better on the Mac, by the way). Adobe are also infamous among Mac users and developers for dragging their feet horribly in supporting current technologies; CS5 is the first version of Adobe Creative Suite that even begins to support OS X natively – an OS which has been around since 2001.

I've known some of the guys developing for Adobe, and they're not stupid...even if they believe that their management could give codinghorror.com enough material to take it "all the way to CS50." But I don't recall even Dilbert's Pointy Haired Boss never actually told his team to code stupid, even if he did resort to bribery (PHB announces he'll pay $10 for every bug fix. Wally says, "Hooray! I'm gonna code me a minivan!"...i.e., several thousand [fixed] bugs).

No, Adobe folk aren't stupid, per se; they're just held back by the same forces that hold back too many in our craft...chiefly inertia and consistently being pushed to cut corners. Even (especially?) if you fire the existing team and ship the work off halfway around the world to a bunch of cheaper guys, that's a non-solution to only part of the problem. Consider, as a counterexample, medicine; in particular, open-heart surgery. Let's say that Dr. X develops a refinement to standard technique that dramatically improves success rates and survival lifetimes for heart patients he operates on. He, or a colleague under his direct instruction, will write up a paper or six that get published in professional journals – that nearly everybody connected with his profession reads. More research gets done that confirms the efficacy of his technique, and within a fairly short period of time, that knowledge becomes widespread among heart surgeons. Not (only) because the new kids coming up learned it in school, but because the experienced professionals got trained up as part of their continuing education, required to keep their certification and so on.

What does software development have in common with that – or in fact, with any profession? To qualify as a profession, on the level of the law, accounting, medicine, the military, or so on, a profession is generally seen as

  • Individuals who maintain high standards of ethics and practice;

  • who collectively act as gatekeepers by disciplining or disqualifying individuals who fail to maintain those standards;

  • who have been found through impartial examination to hold sufficient mastery of a defined body of knowledge common to all qualified practitioners, including recent advances;

  • that mastery being retained and renewed by regular continuing education of at least a minimum duration and scope throughout each segment of their careers;

  • the cost of that continuing education being knowingly directly or indirectly subsidised by the professional's clients and/or employer;

  • such that these professionals are recognised by the public at large as being exclusively capable of performing their duties with an assuredly high level of competence, diligence and care;

  • and, in exchange for that exclusivity, give solemn, binding assurances that their professional activities will not in any way contravene the public good.

Whether every doctor (or accountant, or military officer, or lawyer) actually functions at that level at all times is less of a point than that that is what is expected and required; what defines them as being "a breed apart" from the casual hobbyist. For instance, in the US and most First World countries, an electrician or gas-fitter is required to be properly educated and licensed. This didn't happen just because some "activists" decided it was a good idea; it was a direct response to events like the New London School explosion. By contrast, in many other countries (such as Singapore, where I live now), there is no such requirement; it seems that anybody can go buy himself the equipment, buy a business license (so that the government is reassured they'll get their taxes), and he can start plastering stickers all over town with his name, cell phone number and "Electrical Repair" or whatever, and wait for customers to call.

Bringing this back to software, my point is this: there is currently no method to ensure that new knowledge is reliably disseminated among practitioners of what is now the craft of software development. And while true professionals have the legal right to refuse to do something in an unsafe or unethical manner (a civil engineer, for example, cannot sign off on a design that he has reason to believe would endanger public users/bystanders, and without properly-executed declarations by such an engineer, the project may not legally be built). Software developers have no such right. Even as software has a greater role in people's life from one day to the next, the process by which that software is designed, developed and maintained is just as ad hoc as the guy walking around Tampines stuffing advertisements for his brother's aircon-repair service under every door. (We get an amazing amount of such litter here.)

Adobe may eventually get Creative Suite fixed. Perhaps CS10 or even CS7 will deliver twice the performance on an 8-core processor than on a 4-core one. But they won't do it unless they can cost-justify the improvement, and the costs are going to be much higher than they might be in a world where efficient, modern software-development techniques were a professionally-enforced norm.

That world is not the one we live in, at least not today. My recurring fear is that it will take at least one New London fire with loss of life, or a software-induced BP Deep Horizon-class environmental catastrophe, for that to start to happen. And, as always, the first (several) iterations won't get it right – they'll be driven by politics and public opinion, two sources of unshakable opinion which have been proven time and again to be hostile to professional realities.

We need a better way. Now, if not sooner.

Saturday, 8 May 2010

Making URL Shorteners Less "Evil"

The following is the text of a comment I attempted to post to an excellent post on visitmix.com discussing The Evils of URL Shorteners. I think Hans had some great points, and the comments afterward seem generally thoughtful.

This is a topic which I happen to think is extremely important, for both historical and Internet-governance reasons, and hope to see a real discussion and resolution committed to by the community. Thanks for reading.


I agree with the problem completely, if not with the solution. I was a long-time user and enthusiastic supporter of tr.im back in the day (up to what, a couple of months ago?) It was obvious they were doing it more or less as a public service, not as a revenue-generating ad platform; they were apparently independent of Twitter, Facebook and the other "social media" services (which is important; see below) and several other reasons. Unfortunately, since the First Law of the InterWebs seems to be that "no good deed goes unpunished," they got completely hammered beyond any previously credible expectation, and, after trying unsuccessfully to sell the service off, are in the process of pulling the plug.

I think it's absolutely essential that any link-shortening service be completely independent of the large social-media sites like Facebook and Twitter, specifically because of the kind of trust/benevolence issues raised in the earlier comments. We as users on both ends of the link-shortening equation might trust, say, Facebook because their policies at the time led us to believe that nothing dodgy would be done in the process. I think the events of the past few weeks, however, have conclusively proven how illusory and ill-advised that belief can be. Certainly, such a service would give its owner a wealth of valuable marketing data (starting with "here's how many unique visitors clicked through links to this URL, posted by this user"). They could even rather easily implement an obfuscation system, whereby clicking through, say, a face.bk URL would never show the unaltered page, but dynamically rewrite URLs from the target site so that the shortener-operator could have even MORE data to market ("x% of the users who clicked through the shortened URL to get to this site then clicked on this other link," for example). For a simple, benign demonstration of this, view any foreign-language page using Google Translate. (I'm not accusing Google of doing anything underhanded here; they're just the most common example in my usage of dynamic URL rewriting.)

Another security catastrophe that URL shorteners make trivially easy is the man-in-the-middle exploit, either directly or by malware injected into the user's browser by the URL-shortener service. The source of such an attack can be camouflaged rather effectively by a number of means. (To those who would say "no company would knowingly distribute malware", I would remind you of the Sony rootkit debacle.)

So yeah, I resent the fact that I essentially must use a URL-shortener (now j.mp/bit.ly) whenever I send a URL via Twitter. I also really hate the way too many tweets now use Facebook as an intermediary; whenever I see a news item from a known news site or service that includes a Facebook link, I manually open the target site and search for the story there. That is an impediment to the normal usage flow, reducing the value of the original link.

Any URL-shortening service should be transparent and consistent with respect to its policies. I wouldn't even mind seeing some non-Flash ads on an intermediate page. ("In 3 seconds, you will be redirected to www.example.com/somepage, which you requested by clicking on w.eb/2f7gx; click this button or press the Escape key on your keyboard to stop the timer. If you click on the ad on this page, it will open in a new window or tab in your browser.")

Such a service would have to be independent of the Big Names to be trustworthy. It's not for nothing that "that zucks" is becoming a well-known phrase; the service must not offer even the potential for induced shadiness of behaviour.

I'd like to see some sort of non-profit federation or trade association built around the service; the idea being that 1) some minimal standards of behaviour and function could be self-enforced, and especially 2) that member services that fold would have some ability/obligation to have their shortened link targets preserved. This way, there would still be some way of continuing to use links generated from the now-defunct service.

Since the announcement that the Library of Congress will be archiving ALL tweets as an historical- and cultural-research resource, and contemplating a future in which it is expected that URL-shortening services will continue to fold or consolidate, the necessity and urgency of this discussion as an Internet-governance issue should have become clear to everyone. I hope that we can agree on and implement effective solutions before the situation degrades any further.

She's Putting Me Through Changes...

...they're even likely to turn out to be good ones.

As you may recall, I've been using and recommending the Kohana PHP application framework for some time. Kohana now offer two versions of their framework:

  • the 2.x series is an MVC framework, with the upcoming 2.4 release to be the last in that series; and

  • the 3.0 series, which is an HMVC framework.

Until quite recently, the difference between the two has been positioned as largely structural/philosophical; if you wished to develop with the 'traditional' model-view-controller architecture, then 2.x (currently 2.3.4) is what you're after; with great documentation and tutorials, any reasonably decent PHP developer should be able to get Real Work™ done quickly and efficiently. Oh the other hand, the 3.0 (now 3.0.4.2) offering is a hierarchical MVC framework. While HMVC via 3.0 offers some tantalising capabilities, especially in large-scale or extended sequential development, there remains an enthusiastic, solid community built around the 2.3 releases.

One of the long-time problems with 2.3 has been how to do unit testing? Although vestigial support for both a home-grown testing system and the standard PHPUnit framework exists in the 2.3 code, neither is officially documented or supported. What this leads to is a separation between non-UI classes, which are mocked appropriately and tested from the 'traditional' PHPUnit command line, and UI testing using tools like FitNesse. This encourages the developer to create as thin a UI layer as practical over the standalone (and more readily testable) PHP classes which that UI layer makes use of. While this is (generally) a desirable development pattern, encouraging and enabling wider reuse of the underlying components, it's quite a chore to get an automated testing/CI rig built around this.

But when I came across a couple of pages like this one on LinkedIn (free membership required). This thread started out asking how to integrate PHPUnit with Kohana 2.3.4, and then described moving to 3.0 as

I grabbed Kohana 3, plugged in PHPUnit, tested it, works a treat! So we're biting the bullet and moving to K3! :)

I've done a half-dozen sites in Kohana 2.3, as I'd alluded to earlier. I've just downloaded KO3 and started poking at it, with the expectation to move my own site over shortly and, in all probability, moving 3.0 to the top of my "recommended tools" list for PHP.

Like the original poster, Mark Rowntree, I would be interested to know if and how anybody got PHPUnit working properly in 2.3.4.

Thanks for reading.

Tuesday, 27 April 2010

Let's Do The Time Warp Agai-i-i-i-n!! (Please, $DEITY, no...)

For those who may somehow not be aware of it, LinkedIn is a (generally quite good) professionally-oriented social-networking site. This is not Facebook, fortunately. It's not geared towards teenagers raving about the latest corporate boy band du jour. It often can be, however, a great place to network with people from a variety of vocational, industry and/or functional backgrounds to get in contact with people, share information, and so on.

One of the essential features of LinkedIn is its groups, which are primarily used for discussions and job postings. In the venerable Usenet tradition, these discussions can have varying levels of insightful back-and-forth, or they can degenerate into a high-fidelity emulation of the "Animal House" food fight. As with Usenet, they can often give the appearance of doing both at the same time. Unlike Usenet, one has to be a member of LinkedIn to participate.

One of the (several) groups I follow is LinkedPHPers, which bills itself as "The Largest PHP Group" on LinkedIn. Discussions generally fall into at least one of a very few categories:

  • How do I write code to solve "this" problem? (the 'professional' version of "Help me do my homework");

  • What do people know/think about "this" practice or concept?

  • I'm looking for work, or people to do work; does anybody have any leads?

As veterans of this sort of discussion would expect, the second type of discussion can lead to long and passionate exchanges with varying levels of useful content (what became known on Usenet as a "flame war.") The likelihood of such devolution seems to be inversely proportional to its specificity and proportionally to the degree which the concept in question is disregarded/unfamiliar/unknown to those with an arguable grasp of their Craft.

It should thus be no surprise that a discussion on the LinkedPHPers group of "Procedural vs Object Oriented PHP Programming" would start a flame war for both of the above reasons. With 58 responses over the past month as I write this, there are informational gems of crystal clarity buried in the thick, gruesome muck of proud ignorance. As Abraham Lincoln is reported to have said, "Better to remain silent and be thought a fool than to speak out and remove all doubt."

What's my beef here? Simply that this discussion thread is re-fighting a war that was fought and settled over a quarter-century ago by programming in general. The reality is that any language that has a reasonable implementation of OOP (with encapsulation/access control, polymorphism and inheritance, in that order by my reckoning) should be used in that way.

Several of the posts trot out the old canard about a performance 'penalty' when using OOP. In practice, that's true of only the sharpest edge cases – simple, tiny, standalone classes that should never have been developed that way because they don't provide a useful abstraction of a concept within the solution space, generally by developers who are not professionally knowledgeable of the concepts involved and quite often by those copying and pasting code they don't understand into their own projects (which they also don't understand). That bunch sharply limited the potential evolution and adoption of C++ in the '80s and '90s, and many of their ideological brethren have made their home in Web development using PHP.

Yes, I know that "real" OOP in PHP is a set of tacked-on features, late to the party; first seriously attempted in PHP 4, with successively evolving implementations in 5.0, 5.2 and 5.3, with the semi-mythological future PHP 6 adding many new features. I know that some language features are horribly unwieldy (which is why I won't use PHP namespaces in my own code; proven idea, poor implementation). But taken as a whole, it's increasingly hard to take the Other Side ("we don' need no steeeenkin' objects") at all seriously.

The main argument for ignoring the "ignore OOP" crowd is simply this: competent, thoughtful design using OOP gives you the ability to know and prove that your code works as expected, and data is accessed or modified only in the places and ways that are intended. OOP makes "software-as-building-blocks" practical, a term that first gained currency with the Simula language in the mid-1960s. OOP enables modern software proto-engineering practices such as iterative development, continuous integration and other "best practices" that have been proven in the field to increase quality and decrease risk, cost and complexity.

The 'ignore OOP in PHP' crowd like to point to popular software that was done in a non-OOP style, such as Drupal, a popular open-source Web CMS. But Drupal is a very mature project, by PHP standards; the open-source project seems to have originated in mid-2000, and it was apparently derived from code written for a project earlier still. So the Drupal code significantly predates PHP 5, if not PHP 4 (remember, the first real whack at OOP in PHP). Perusing the Drupal sources reveals an architecture initially developed by some highly experienced structured-programming developers (a precursor discipline to OOP); their code essentially builds a series of objects by convention, not depending on support in the underlying language. It is a wonder as it stands – but I would bet heavily that the original development team, if tasked with re-implementing a Web CMS in PHP from a blank screen, would use modern OO principles and the underlying language features which support them.

And why would such "underlying language features" exist and evolve, especially in an open-source project like PHP, if there was not a real, demonstrable need for them? Saying you're not going to do OOP when using PHP is metaphorically akin to saying you intend to win a Formula One race without using any gear higher than second in the race.

Good luck with that. You might want to take a good, hard look at what your (more successful) colleagues are doing, adopt what works, and help innovate your Craft further. If you don't, you'll continue to be a drag on progress, a dilettante intent upon somehow using a buggy whip to accelerate your car.

It doesn't work that way anymore.

Friday, 16 April 2010

A Slight Detour: Musing on Open Data Standards as applied to Social Entrepreneurship and Philanthropy

This started out as a conversation on Twitter with @cdegger, @ehrenfoss, @p2173 and other folks following the #opendata, #socent or #10swf hash tags. Twitter is (in)famous for being limited to 140 characters per “tweet”; with the extra hash tags and all, that's reduced to 96. I wrote a reply and used a text editor to break it into "tweets"; by the time I got to “(part 8/nn),” I knew it was crazy to try and tweet an intelligible response.

So, folks, here's what I think; I hope it's more intelligible this way. Comments appreciated, here or on Twitter.


What I see #opendata doing for #socent is to allow individuals or groups to build on and share information on opportunities, needs, donors, etc.. This collaboration would use open data formats and tools that iteratively improve philanthropy effectiveness.

Think of how a wiki enables text collaboration, shared learning and discovery, or how instant messaging allows both realtime and time-shifted conversation. Now expand that idea to a sort of "social database" that can be run like a less elitist Wikipedia mated with an RSS feed. Anybody can browse or search the information in the database (needs and offers). They can also get their own local copies of some/all data and have it be updated from the "upstream" source automatically. A smaller group of vetted people can update the "master" data which then gets pushed out to all viewers or subscribers.

This curation is needed to maintain data integrity and to (hopefully) eliminate attacks on or disruptions to the "social database" itself. The sad reality is that any public information on the Internet must have some sort of protection, or it will be damaged or destroyed. I see this as being especially true of a pioneering social-entrepreneurial system like this; there are enough people out there who have a vested interest in this sort of thing not working that security (authentication and validation) must be built in from the start. Otherwise, we will wind up with a situation akin to "spam" and "phishing" with email. Email standards were set in the early days, when the Internet was a primarily academic/scientific resource where all users could legitimately trust each other by default; the current state of the Net is far different. Any open data standards and protocols developed for the "social database" must take this into account.

These open data and protocol standards should be designed with the understanding that they are likely to change over time as the needs of users become better defined and as new opportunities to support those needs present themselves. The first version of a new system (like this) is almost never the simplest, nor will it be the most effective for its purpose. Lessons will be learned that should be folded back into revisions of the standards, in much the same way that later versions of standards like HTML built upon experience gained with earlier versions.

When evolving these data formats and protocols, it is vital that the process be fully transparent, with a balance between building progress and listening to the needs and concerns of the widest possible audience. It is entirely possible that no one standard in a given area will suit all stakeholders. In those instances, some sort of federation built on interchange of some common subset or intermediate format may be helpful. This should be seen as undesirable, however, as it limits the ability of casual or new users to make effective use of the entire distributed system.

The development, maintenance and ownership of standards developed for this project (including necessary legal protection such as copyrights) must be under the auspices of an organization with the visibility and stature to maintain control of the standards, lest they devolve into a balkanized mess that would be as unhelpful to the intended mission as not having any such standards at all. I would expect this organization to be a non-profit organization. Not only will this remove the fiduciary responsibility for monetizing investments made in the technology from the officers of the organization, but other non-profits/NGOs can be expected to cooperate more fully with the parent organization in developing, deploying and maintaining the standards – allowing them to remain open and unencumbered.

Finally, I think it's important to state that I don't see any one type of format as necessarily superior for developing this. I'm aware that there has been a lot of work done with various XML-based systems as part of the #socent build-out to date. After working with XML for approximately ten years now, I have seen some magnificent things done with it, and some absolutely misbegotten things done with it. Particularly with regards to the authentication and validation issues I mentioned earlier, and also with the sheer bulk and relative inefficiency of a large-scale XML data store, there are several issues I can think of. They're solvable, and they're by no means unique to XML, but they are issues that need to be thought about.

EDIT Sunday 18 April: I feel really strongly that one of the things our (distributed) information architecture is going to have to nail from the very beginning is the idea of authentication/verification; does a particular bit of information (what I'd been calling "opportunities, needs, [and] donors" earlier), otherwise we're just laying ourselves open to various black-hat cracking attacks as well as scams, for instance of the "Nigerian 419" variety. I think it's pretty obvious we're going to need some sort of vetting for posters and participants this in turn implies some (loose or otherwise) organization with a necessary minimum amount of administrative/curative overhead to maintain public credibility and apparent control over our own resources. Anybody would be allowed to read basic information, but I think we can all agree on the need for some sort of access control and/or obfuscation of data like individual contact information or some types of legal/financial information that gets tacked into the "social database." This could be pretty straightforward. One hypothetical approach might be as basic as having those who wish to publish information to go through a simple registration process that issues them some piece of private data (possibly even the open standard OAuth authentication that Twitter and others use for applications hooking into their infrastructure). Either alternatively or in conjunction, a public-key cryptography system such as GNU Privacy Guard could be used to prove data came from who it claimed to. For instance, the data to be published could be enclosed in a ZIP file or other archive, along with a "signature" and the identification of a registered publisher. (There's no way that I'm aware of to actually embed the 'signature' itself into the data file: the signature depends on the exact content of the data, and by adding the signature, the content is changed.)

To the non-technical user, the effects of such a system should be:

  • The 'Foundation' (for want of a better term) can use already-proven, open standards to enhance member and public confidence in the accuracy and transparency of the content of the "social database". Attempting to "reinvent the wheel" in this area is an oft-proven Bad Idea™;

  • The Foundation will be able to develop, deploy, manage and maintain a (potentially) widely-distributed philanthropic/social database architecture that can support a wide variety of organizational/use models;

  • Having this sort of authentication and validation will aid in the evolution of the technical and "business" architectures of the system; new services can be layered on top of existing ones by different users as needed.

For instance, if a particular member has announced that they will publish information in a specific version of the schema for the "social database" (say, during registration), any later information purportedly from that member published in an older format should raise warning flags, as it may be a sign of actual or attempted compromise in security and data integrity. A benign incident, such as the member inadvertently using a software tool that submits data in an inappropriate format, can be quickly identified, communicated and rectified.

This will be vital if we are to create a data system which publicly distributes data pertinent to specific members outside those members' control that could include information that must not be altered by outside influences (such as, say, budget information) or information that, for general Internet-security reasons, should not be directly visible to all and sundry (for instance, contact information might be accessible to members but not the casual browsing public).

Wednesday, 14 April 2010

Process: Still 'garbage in, garbage out,', but...

...you can protect yourself and your team. Even if we're talking about topics that everybody's rehashed since the Pleistocene (or at least since the UNIVAC I).

Traditional, command-and-control, bureaucratic/structured/waterfall development process managed to get (quite?) a few things right (especially given the circumstances). One of these was code review.

Done right, a formal code review process can help the team improve a software project more quickly and effectively than ad-hoc "exploration and discovery" by individual team members. Many projects, including essentially all continuing open-source projects that I've seen, use review as a tool to (among other things) help new teammates get up to speed with the project. While it can certainly be argued that pair programming provides a more effective means to that particular end, they (and honestly, most agile processes) tend to focus on the immediate, detail-level view of a project. Good reviews (including but not limited to group code reviews) can identify and evaluate issues that are not as visibly obvious "down on the ground." (Cédric Beust, of TestNG and Android fame, has a nice discussion on his blog about why code reviews are good for you.

Done wrong, and 'wrong' here often means "as a means of control by non-technical managers, either so that they can honour arbitrary standards in the breach or so that they can call out and publicly humiliate selected victims," code reviews are nearly Pure Evil™, good mostly for causing incalculable harm and driving sentient developers in search of more humane tools – which tend (nowadays) to be identified with agile development. Many individuals prominent in developer punditry regularly badmouth reviews altogether, declaring that if you adopt the currently-trendy process, you won't ever have to do those eeeeeeeeevil code reviews ever again. Honest. Well, unless.... (check the fine print carefully, friends!)

Which brings us to the point of why I'm bloviating today:

  1. Code reviews, done right, are quite useful;

  2. Traditional, "camp-out-in-the-conference-room" code reviews are impractical in today's distributed, virtual-team environment (as well as being spectacularly inefficient), and

  3. That latter problem has been sorted, in several different ways.

This topic came up after some tortuous spelunking following an essentially unrelated tweet, eventually leading me to Marc Hedlund's Code Review Redux... post on O'Reilly Radar (and then to his earlier review of Review Board and to numerous other similar projects.

The thinking goes something like, Hey, we've got all these "dashboards" for CRM, ERP, LSMFT and the like; why not build a workflow around one that's actually useful to project teams. And these tools fit the bill – helping teams integrate a managed approach to (any of several different flavours of) code review into their development workflow. This generally gets placed either immediately before or immediately after a new, or newly-modified, project artifact is checked into the project's SCM. Many people, including Beust in the link above, prefer to review code after it's been checked in; others, including me, prefer reviews to take place before checkin, so as to not risk breaking any builds that pull directly from the SCM.

We've been using collaborative tools like Wikis for enough years now that any self-respecting project has one. They've proven very useful for capturing and organising collective knowledge, but they are not at their best for tracking changes to external resources, like files in an SCM. (Trac mostly finesses this, by blurring the lines between a wiki, an SCM and an issue tracker.) So, a consensus seems to be forming, across several different projects, that argues for

  • a "review dashboard," showing a drillable snapshot of the project's code, including completed, in-process and pending reviews;

  • a discussion system, supporting topics related to individual reviews, groups of reviews based on topics such as features, or the project as a whole; these discussions can be searched and referenced/linked to; and

  • integration support for widely-used SCM and issue-tracking systems like Subversion and Mantis.

Effective use of such a tool, whatever your process, will help you create better software by tying reviews into the collaborative process. The Web-based versions in particular remove physical location as a condition for review. Having such a tool that works together with your existing (you do have these, yes?) source-code management and issue-tracking systems makes it much harder to have code in an unknown, unknowable state in your project. In an agile development group, this will be one of the first places you look for insight into the cause of problems discovered during automated build or testing, along with your SCM history.

And if you're in a shop that doesn't use these processes, why not?


On a personal note, this represents my return to blogging after far, far too long buried under Other Stuff™. The spectacularly imminent crises are now (mostly, hopefully) put out of our misery now; you should see me posting more regularly here for a while. As always, your comments are most welcome; this should be a discussion, not a broadcast!

Sunday, 7 March 2010

We've Got Issues...

Just because something's the best you've ever seen at doing what it does, does not mean that it's perfect – or even necessarily consistently good. Sometimes the issue you run into is so bad, you wonder how the offender managed to live as long and be as widely respected as it has.

As you probably know, every serious software developer, working on a project that has an intended lifetime measured in hours or larger units, uses some form of version control software. This has been true for over 20 years for PC/Mac/etc. developers, and much longer for folks working on larger systems have had tools for much longer. (The first such software was probably SCCS in 1972.) Numerous commercial (Perforce, IBM/Rational (formerly Atria) ClearCase, etc.) are on offer, as well as a cornucopia of free/open source systems – Subversion, Mercurial, and git.

For several years now, I have been a generally enthusiastic Subversion user. It is cross-platform, stable, reliable; well-supported by third-party tools, books and so on; and being an open-source project doesn't hurt a bit. Reading through the Subversion book, one gets the impression that this was created by people experienced not just in software development, but experienced in version control software.(Karl Fogel, one of the principal initial developers of Subversion, was well-known in the community built around the earlier cvs software; he'd literally written the book on it.)

But, as I alluded to in the first paragraph, Subversion isn't perfect. It's better for most projects than a lot of the stuff out there, but it's got weak points. The Book is refreshingly honest about some of them, but unless you've read that specific part recently, it might not stick in your memory. Incidents such as what happened with me this afternoon are likely to ensure that I remember for a while.

I've recently been using the PHP framework called Kohana to develop Web sites. It's generally well-designed, mature, very efficient, and encourages many "best practices", such as the model-view-controller (MVC) architecture. It's more functional than the truly lightweight packages like Ulysses, while being more learnable and rapidly-productive than the Zend Framework, the 900-pound gorilla in the room. And I strongly suspect that the problem which led to the problem that inspired this post might well not have been Kohana's fault at all.

Kohana, as I said, makes heavy use of the MVC pattern. It also uses PHP 5's class autoloading feature, as documented on the Learning Kohana site. (One of the other Good Things about Kohana is the amount, quality and organization of the online documentation; an apparently large and highly talented team has put in yeoman effort.) As long as you put your classes in any of the "right" places, with the right filenames, everything will Just Work™ no need to mess around with 'require', 'include' or the like (which, in the PHP pre-5 days, was a major cause of breakage).

So I had this Web site I was working on,

  • on my Mac;
  • using PHP and Kohana; and
  • using Subversion for version control.

I got everything done, ready to deploy to a server running the same versions of the Apache Web server and of PHP that I have on my development system. I copied the site over to the proper directories on the server, and started testing.

(insert spectacular breaking/crashing noise here.)

I'd forgotten two important details. The default file system for OS X, HFS+, supports either case-sensitive or -insensitive filename matching, based on a flag set at format time that defaults to case-insensitive. In other words, if your program searches for files named foo.c, Foo.C and fOo.C, all three would match a file with the name FOO.c, and only one such name could be used in a given directory. The default file systems in Linux, ext3 and ext4 (which are each evolutions of the older ext2), are (at least by default) fully case-sensitive.

Further, Kohana's autoloading mechanism depends on case sensitivity to figure out exactly what and where it's looking for code to include. Consider the following snippet from the core Kohana source file's auto_load() method:

                if (($suffix = strrpos($class, '_')) > 0)
                {
                        // Find the class suffix
                        $suffix = substr($class, $suffix + 1);
                }
                else
                {
                        // No suffix
                        $suffix = FALSE;
                }

                if ($suffix === 'Core')
                {
                        $type = 'libraries';
                        $file = substr($class, 0, -5);
                }
                elseif ($suffix === 'Controller')
                {
                        $type = 'controllers';
                        // Lowercase filename
                        $file = strtolower(substr($class, 0, -11));
                }
                elseif ($suffix === 'Model')
                {
                        $type = 'models';
                        // Lowercase filename
                        $file = strtolower(substr($class, 0, -6));
                }
                elseif ($suffix === 'Driver')
                {
                        $type = 'libraries/drivers';
                        $file = str_replace('_', '/', substr($class, 0, -7));
                }
                else
                {
                        // This could be either a library or a helper, but libraries must
                        // always be capitalized, so we check if the first character is
                        // uppercase. If it is, we are loading a library, not a helper.
                        $type = ($class[0] < 'a') ? 'libraries' : 'helpers';
                        $file = $class;
                }

Now, ordinarily, when I create these sites, I use naming conventions consistent with what the relevant documentation calls for. In this site's case, however, I was creating some new helper classes, and I'd been in a rush and fobbed the names; what went into Subversion was all lower case. Oops.

Now, here's the real fail. You can't easily rename files in Subversion. If I type the command "svn rename foo.php Foo.php", I just get back an error message saying svn: Path 'Foo.php' is not a directory. What I need to do instead is something like:

svn mv foo.php ../Foo.php && svn mv ../Foo.php .

The good news is that this preserves the modification history and isn't all that difficult to figure out. The bad news is that it happens at all, and takes a nonzero amount of time to figure out.

Does this mean that I'm going to be making more use of other version-control systems instead of Subversion? Not at all; if you think about it, the problem is at least as likely to be the way OS X handles file names as it is anything in Subversion. Well, am I going to stop using my Mac for development?

You can have my Mac keyboard when you pry it from my cold, dead fingers. Even if I had a problem like this once a day, that took me an hour to figure out (this didn't take nearly that long), I'd still be far more productive than on Linux or (no surprise here) Windows. And I suppose it would be too much to ask for to imagine Subversion looking at that rename-by-case command, realizing it's on a less-than-fully case-sensitive file system, and saying "I'm sorry, Dave; I can't do that. I can't rename a file to its original name."

Saturday, 27 February 2010

Protecting Yourself and Others from Yourself and Others

Nowadays, there's simply no excuse for any computer connected to the Internet, regardless of operating system, not to have both a hardware firewall (usually implemented in your router/broadband "modem") and a software firewall, monitoring both incoming and outgoing traffic.

The software firewall I've been recommending to those "unable"/unwilling to leave the hypersonic train wreck that is Windows has been ZoneAlarm's free firewall. Users of modern operating systems should start off by enabling and configuring the firewall built into their OS (either ipfw on Mac OS X/BSD Unix, or netfilter for Linux). That can be tricky to manage; fortunately there are several good "front end" helper packages out there, such as WaterRoof. Another excellent, popular Mac tool is Little Snitch; the latter two work quite well together.

However, no matter which tools you use to secure your system's Net connection, one of the main threats to continued secure, reliable operation remains you, the user. This has a habit of popping up in unexpected but obvious-in-hindsight ways.

For instance, I recently changed my Web/email hosting service. Long prior to doing so, I had defined a pair of ipfw rules that basically said "Allow outgoing SMTP (mail-sending) connections to my hosting provider; prevent outgoing mail-sending connections to any other address." Thus, were any of the Windows apps I ran inside a VMware Fusion VM to become compromised (as essentially all Windows PCs in Singapore are), they couldn't flood spam out onto the Net – at least not using normal protocols. This didn't do anything to protect the Net from other people's Windows PCs that might sometimes legitimately connect to my network, but it did ensure that the Windows VM (or anything else) running on the Mac was far less likely to contribute to the problem.

A few days after I made the change, I noticed that my outgoing mail server configured in my email client wasn't changed over to the new provider, so I fixed that. And then, I couldn't send mail anymore. It took an embarrassingly long time (and a search of my personal Wiki) to remember "hey, I blocked all outgoing mail except to (the old provider) in my software firewall." Two minutes later, WaterRoof had told ipfw to change the "allowed" SMTP connection, and I soon heard the "mail sent" tone from my client.

Mea culpa, indeed. But why bother blogging about this? To reinforce these ideas to my reader(s):

  1. First, that if you aren't already using a software firewall in addition to the hardware one you probably have (especially if you're not aware of it), you should be. It will make you a better Net citizen; and

  2. Use a Wiki, either a hosted one like PBworks or one running on your own system. (I use Dokuwiki; this page on Wikipedia has a list of other packages for the host-it-yourselfer.)

  3. Use your Wiki to record things that you'd previously think of writing in a dead-tree notebook or a bajillion Post-it® notes stuck to all available surfaces. This specifically and emphatically includes details of your system configuration(s).

Of course, if using a public, hosted wiki, you'll want to make sure that you can secure pages with sensitive data, like system configurations; why make Andrei Cracker's job easier than it already is?

This whole rant is basically a single case of the age-old warning, "if you don't write it down (in a way that it can be found again at need), it never happened." As the gargantuan information fire-hose that is the Internet continually increases the flow of information as well as increasing the rate of increase, this becomes all the more critical for any of us.

Tuesday, 23 February 2010

Again, Standards Matter

People who I've worked with, or worked for, or read my writing here and elsewhere, have probably figured out that I'm a huge fan of standards just about everywhere they make sense: data formats, user interfaces, and so on. After all, why should we have to relearn how to drive a car simply because we buy a new Ford in place of a Toyota that the Government doesn't want us driving anymore? (You see very few old — or even fully-paid-for — cars in Singapore.) The steering wheel, pedals, and other controls are in a familiar layout; any slight differences are quickly adapted to.

Not so with the Western world's most widely-sold word processing software (for instance); when Microsoft Word 2007 for Windows shipped with a different, unique ('innovative', whether or not you find it debatable, is beside the current point) interface. Bloggers bloviated, many users were deeply confused, and corporate help-desk calls (and support/training costs) spiked. People running Windows PCs were very rarely neutral about the change.

A year later, Microsoft shipped Word 2008 for the Mac. Although there were some interface changes, the points of loudest discussion in the Word:Mac user community seemed to be

  • the omission of Visual Basic for Applications as an attempted cross-platform macro language; and
  • the new semi-proprietary document format, which allowed flawless interchange with Windows users (VBA notwithstanding).

Interface changes, per se, didn't spark nearly as much angst as had the Windows version of the year before. While a certain amount of this should no doubt be attributed to Microsoft's experience with the earlier release, the main reason was both different and obvious.

When developing Mac applications, it's much easier and better to follow the Apple Human Interface Guidelines than to "roll your own" interface. Developers, including Microsoft, are well aware of the ways in which the Mac development tools make your life easier if you structure and style your app to meet the Guidelines (and user expectations), as opposed to how much scutwork needs to be reinvented from scratch to do things differently. Users benefit even more, as the amount of learning needed to use a new app, or a new version of an existing app, are much less than is the average under Windows or Linux. And, unlike far too many Windows programs, Mac programs are usually highly discoverable; the user may not know how to accomplish a desired action, but there is one (and preferably only one) obvious path to follow, and mis-steps are generally not heavily penalised.

Right, "everybody" knows this, so why did I spend five paragraphs restating the reasonably obvious? Because the real intent of this post is to draw your attention to a phenomenon which is a necessary outcome of that standardisation and discovery: it is much easier to switch from one Mac app that performs a given task to another than it is on Windows. Most Mac users periodically switch between different applications for a given purpose, even keeping two or three installed on their systems. When you ask them why, they don't (generally) point to bugs or deficiencies in one product over another; they merely switch between them as their use cases change. For example, though I have both Microsoft Office and Apple iWork on this Mac, I will often create or open smaller Word documents in a simpler application such as AbiWord instead. It doesn't have all the features of Word or Pages, but it has the basics, and it loads more quickly and uses fewer resources than its two "big brothers."

The average Mac user is also generally more willing to try new applications, and generally owns more applications, than is the average for Windows. Since she is confident in her ability to pick up and use a new program, generally without resorting to a manual or even online help, there is a much more open discussion between users and developers, since both have seen a good bit of "the competition" and know what they like and don't like.

More rarely than is the case elsewhere, but not rarely enough, this easy migration from one app to another is due to real or perceived defects in a previously-used program. This happened to me recently; the program I had been using for a few months as my main Twitter client was not showing me all the tweets of people I was following in the "mainline" stream that I would see when I looked at each person's stream individually. Once you start following more than about two or three people, the mainline becomes absolutely indispensable; you simply don't want to have to take the time to look at each stream in isolation. So, I moved to another client, Nambu (now in "private beta" for a new release; version 1.2.1 can be found via web search).

Two immediate observations: I already know how to use this, even though Nambu has a far more dense presentation than my earlier client. And, because of that "dense presentation", it now takes me about a fifth as much time to get through my morning and afternoon Twitter catchups as it did previously. (Multi-column view is the killer feature, guys; there's only one little thing I'd like to see different...)

Again, why make noise about this? Simple: I've been a Windows user (usee?) and developer quite literally as long as there's been a "Windows"; I ran across my 1.0-beta developer kit floppies (5-1/4", of course) a couple of weeks ago (thinking about having them bronzed...or mounted on a dartboard. Maybe both.) But the nasty truth is, I very rarely change applications that perform a given task in Windows. The pain level and the (hopefully temporary) hit on my productivity aren't worth it until the pain becomes absolutely excruciating. I don't have that problem with the Mac, at all. I can try out new applications at will, even daily-workflow applications, secure in the knowledge that

  • I already know how to use this, at least well enough to get started, and
  • I can go back — or on to another candidate — any time I want to.

There's a word for the feeling that having that kind of freedom, that control over your computing experience gives you:

Empowerment.

Saturday, 20 February 2010

Companies Lose their Minds, Then their Partners, Then their Customers, Then...

Following is a comment which I posted to Jason Kincaid's article on TechCrunch, "Why Apple's New Ban Against Sexy Apps is Scary". I don't know why Apple seem to be deliberately shooting themselves in so many ways on the iPhone recently; I am sure that they are leaving golden opportunities for Palm, Android and anybody else who isn't Apple or Microsoft.

Even if you're not developing for the iPhone or even for the Mac, this whole drift should concern you — because its most likely effect is going to be that you have fewer choices in what should be a rapidly-expanding marketplace.

Thanks for reading.


Exactly; they're pulling a Singapore here. They're saying "we're better than anybody else" because they've got this App Store with hundreds of thousands of titles, and hundreds of useful apps. Then they turn around and say "we're the only game in town; we're not going to let you sell your apps to customers any way except through us — and oh, yeah, we can be completely arbitrary and capricious before, during and after the fact."

Let me be clear: up until very recently, I've been an unalloyed Apple fan; the only sensible response to 25+ years of Windows development and user support and 10 years hitting similar but different walls in Linux. I'm typing this on one of the two Macs sitting on my desk. I've got logs and statistics that prove I'm far more productive on my worst Mac days than I ever was on my best Windows days. And I've had several Switcher clients over the past few years who say the same thing.

I can write and sell any app I want on the Mac; Apple even give me all the (quite good) tools I need right in the box. I can use any app I want to on my Mac; the average quality level is so far above Windows and Linux apps it's not even funny. In neither of those do I need the permission of Apple or anyone else outside the parties to the transaction involved. Apple do have good support for publicising Mac apps; browse http://www.apple.com/downloads/ to see a (far earlier) way they've done it right. But developers don't have to use their advertising platform.

With the iPhone, and soon the iPad, they're doing things in a very untraditionally-Apple way: they're going far out of their way to alienate and harm developers. You know, those people who create the things that make people want to use the iPhone in the first place. And a lot of us are either leaving the platform or never getting into iPhone development in the first place.

And that can't be healthy for the long-term success of the Apple mobile platform (iPhone/iPad/iWhatComesNext). As a user, as a developer, as a shareholder, that disturbs me, as I believe it should disturb anyone who cares about this industry.

Wednesday, 10 February 2010

Simple Changes Rarely Are

I want to say "thank you" to the anonymous commenter to my post, XHTML is (Nearly) Useless, who said "Paragraphs would be useful."

Google's Blogger software, by default, separates paragraphs by an "HTML break tag" (br); this preserves the visible separation as intended, but destroys the structural, semantic flow. I recently discovered this, found the feature to turn it off ("don't put "break" tags in unless I bloody tell you to!), and set it. I naively thought that this would apply only to new posts, and posts that had previously been saved in the old, "broken", format would stay as is.

Oops.

I must now go back and hand-edit every post over the last nearly five years so that their paragraph flow is properly marked up. Since I do in fact have other demands on my time, this is not likely to be completed immediately; it is, however, a very high priority.

It's enough to make you seriously consider migrating to WordPress — which is apparently trivially easy.

NIH v. An Embarrassment of Riches

One thing most good developers learn early on is not to "reinvent" basic technology for each new project they work on, The common, often corporate, antithesis to this is NIH, or "Not Invented Here." But sometimes, it's hard to decide which "giants" one wants to "stand on the shoulders of."

I've recently done a couple of mid-sized Web projects using PHP and the Kohana framework. A framework, as most readers know, is useful a) by helping you work faster b) by including a lot of usually-good code you don't have to write and maintain (but you should understand!). Good frameworks encourage you to write your own code in a style that encourages reuse by other projects that use the same framework.

One task supported by many frameworks is logging. There have also been many "standalone" (i.e., not integrated into larger systems) logging packages. The most well-known of these, and the source of many derivatives, is the Apache log4j package for Java. This has been ported, also as an Apache project, is log4php.

Log4php has saved me countless hours of exploratory debugging. I stand firmly with the growing group of serious developers who assert that if you use a reasonably agile process (with iterative, red, green, refactor unit testing) and make good use of logging, you'll very rarely, if ever, need a traditional debugger.

What does this have to do with Kohana? Well, Kohana includes its own relatively minimalist, straightforward logging facility (implemented as static methods in the core class, grumble, grumble). There's a standard place for such logs to be written to disk, and a nice little 'debug toolbar' add-on module that lets you see logging output while you're viewing the page that generated it.

So I ought to just ditch log4php in favor of the inbuilt logging system when I'm developing Kohana apps, right? Not so fast...

Log4php, as does log4j, has far more flexibility. I can log output from different sources to different places (file, system log, console, database, etc.), have messages written to more than one place (e.g., console and file), and so on. Kohana's logging API is too simple for that.

With log4php, I have total control over the logging output based on configuration information stored in an external file, not in the code itself. That means I can fiddle with the configuration during development, even deploy the application, without having to make any code changes to control logging output. The fewer times I have to touch my code, the less likely I am to inadvertently break something. Kohana? I only have one logging stream that has to be controlled within my code, by making Kohana method calls.

Many experienced developers of object-oriented software are uncomfortable with putting more than one logical feature into a class (or closely-related set of classes). Why carry around overhead you don't use, especially when your framework offers a nice extension capability via "modules" and "helpers"?. While there may sometimes be arguments for doing so (the PHP interpreter is notoriously slow, especially using dynamic features like reflection), I have always failed to understand how aggregating large chunks of your omniverse into a Grand Unified God Object™ pays dividends over the life of the project.

So, for now, I'll continue using log4php as a standalone tool in my various PHP development projects (including those based on Kohana). One thing that just went onto my "nice to do when I get around to it" list is to implement a module or similar add-on that would more cleanly integrate log4php into the surrounding Kohana framework.

This whole episode has raised my metaphorical eyebrow a bit. There are "best practices" for developing in OO (object-oriented) languages; PHP borrows many of these from Java (along with tools like log4php and PHPUnit, the de facto standard unit-test framework). I did a fairly exhaustive survey of the available PHP frameworks before starting to use Kohana. I chose it because it wasn't a "everything including several kitchen sinks" tool like Zend, it wasn't bending over backwards to support obsolete language misfeatures left over from PHP 4, and it has what looks to be a pretty healthy "community" ecosystem (unlike some once-heavily-flogged "small" frameworks like Ulysses). I'm not likely to stop using Kohana very soon. I may well have to make time to participate in that community I mentioned earlier, if for no other reason that to better understand why things are the way they are.

But that's the beauty of open source, community-driven development, surely?

Tuesday, 2 February 2010

I thought OSS was supposed to break down walls, not build them.

Silly me. I've only been using, evangelizing and otherwise involved in open source software for 15 or 20 years, so what do I know?

In reaction to the latest feature article in DistroWatch Weekly, I'm angry. I'm sad. Most of all, I feel sorry for those losers who would rather keep their pristine little world free of outside involvement, especially when that involvement is with the express intent of not making it easy for non-übergeeks to use open source software — in this case, OpenBSD. OpenBSD, like its cousins FreeBSD and NetBSD, is one of the (current) "major" base versions of BSD Unix. While the latter two have had numerous other projects that have taken their software as a base, fewer have branched from "Open" BSD, as can be seen in this comparison on Wikipedia. Recently, two OpenBSD enthusiasts have attempted to address this, only to be flamed mercilessly for their trouble.

The DistroWatch feature previously mentioned concerns GNOBSD, a project created by Stefan Rinkes, whose goal plainly was to make this highly stable and secure operating system, with a lineage that long predates Linux, accessible to a wider audience (who don't use Mac OS X - based (indirectly) on both FreeBSD and NetBSD).

For his troubles, Mr. Rinkes was the subject of repeated, extreme, egregiously elitist flaming, this thread being but one example. He was eventually forced to withdraw public access to the GNOBSD disc image, and adding a post stating that he did not "want to be an enemy of the OpenBSD Project."

Reading along the thread on the openbsd-misc mailing list brings one to a post by one "FRLinux", linking to another screaming flamewar summarized here, with the most directly relevant thread starting with a message titled "ComixWall terminated. Another hard worker with the intent of creating an OpenBSD-based operating system that was easy for "ordinary people" to use, promptly incurred the wrath of the leader of the OpenBSD effort, Theo de Raadt, with numerous other "worthies" piling on.

WTF?!?

OK, I get it. The OpenBSD folks don't want anyone playing in their sandbox providing access to the OpenBSD software (itself supposedly completely open source, free software) that might in any way compete with "official" OpenBSD downloads or DVD/CD sales. They especially don't want any possibility of the Great Unwashed Masses™ — i.e., anyone other than self-professed, officially-blessed übergeeks — from playing with "their" toys.

OK, fine. It has been a large part of my business and passion to evangelize open source and otherwise free software as an enabler for the use of open data standards by individuals and businesses. I have supported and/or led the migration, in whole or in part, of several dozen small to midsize businesses from a completely closed, proprietary software stack (usually Microsoft Windows and Microsoft Office for Windows) to a mix of other (open and proprietary) operating systems and applications. OpenBSD has historically been high on my list of candidates for clients' server OS needs; if you can manage a Windows Server cluster with acceptable performance and stability, your life will be easier managing almost anything else — provided that you're not locked in to proprietary Windows-only applications and services. In these days of "cloud" computing and "software as a service", the arguments against Ye Olde Standarde are becoming much more compelling.

I just won't be making those arguments using OpenBSD anymore. And to my two clients who've expressed interest in BSD on the desktop for their "power" developers over the next couple of years, I'll be pitching other solutions to them...and explaining precisely why.

Because, out here in the real world, people don't like dealing with self-indulgent assholes. That's a lesson that took this recovering geek (in the vein of "recovering alcoholic") far too many years to learn. And people especially don't like trusting their business or their personal data to such people if they have any choice at all.

And, last I checked, that was the whole point of open standards, open source, and free software: giving people choices that allow them to exercise whatever degree of control they wish over their computing activities. I'd really hate to think that I've spent roughly half my adult life chasing a myth. Too many people have put too much hard work into far too many projects to let things like this get in our way.

Changing Infrastructure - and "Mea maxima culpa *thwack*"

(which I remember from a Monty Python sketch but can't find attribution. Oh well; please feel free to comment and enlighten me.)

About a week ago, I switched my domain hosting for my Web/mail domain, seven-sigma.com, from Simplehost Limited of New Zealand to WebFaction, based in the UK. Now, Simplehost are a good bunch of guys; their customer service is very responsive and you get knowledgeable, thoughtful answers to questions. On most things, they're very willing to work with people to help them resolve issues. If you're looking for a host in this general slice of the planet, put them on your list of providers to have a good look at.

The deal-breaker for me, though, after two years or so with them, was that I need to be able to develop and demonstrate Web sites using the latest and greatest variety of tools, in particular PHP 5.3.x, which introduces important new features (in addition to fixing numerous bugs).

WebFaction, on the other hand, offer PHP 5.3 support (as well as numerous other tools, such as Ruby on Rails, have good pricing on their shared-hosting packages (as does Simplehost, honestly), and fantastic self-help and support options - video tutorials, Twitter feeds, and so on. Also, doing the obligatory Googling for dissatisfied-customer reactions mostly brings up hits like this one, talking about how (except for a couple of brief periods), the only hits were from reviews pointing out how few hits there were. Chatting on IM and IRC with a few customers also helped.

So, a process that I'd been poking at for a couple of months, with three weeks of serious effort, is done with, hopefully for several years to come.

However, there was a silly-me postscript to all of this: For several days after I made the switch, I just wasn't able to get email working. I went onto the WebFaction email-support forums, flipped through a few messages that described solutions to problems other users had had, tried them, and no luck. Then, tonight, I came across the answer that had been staring me plain in the face from the very first message telling me that my account had been set up.

You folks who have your own domains can probably imagine what went wrong; suffice it to say that the difference between what I was reading and what I was thinking was sufficient that over 1,300 email messages have downloaded in less time than it took me to write this post. So if you've been emailing me and wondering why I haven't answered, I apologize. It's unlikely to happen again — hopefully for several years.

And to the Simplehost support guys, thanks very much for your help. This was not in any way at all Simplehost's fault.

So why would I write something like this, showing a fairly major screwup? Because... I'd rather deal with people who admit their faults and publicly commit to do better, than with people who are infallible legends in their own mind. This industry has far too many of the latter sort. I'm hoping that enough other people feel the way I do that I can continue to do great work for my clients.

Thanks for reading.

EDIT:Years on, never mind how many, I discovered that I'd mistyped "WebFaction" when I meant "Simplehost" three paragraphs earlier. Heartfelt apologies to both. Mea maxima culpa *thwack!*