Friday 18 June 2010

I Thought Standard Libraries Were Supposed to be Better...

...than hand coding. Either the PHP folks never got that memo, or I'm seriously misconceptualising here.

Case in point: I was reading through Somebody Else's Code™, and I saw a sequence of "hand-coded" assignments of an empty string to several array entries, similar to:

    $items[ 'key2' ] = '';
    $items[ 'key1' ] = '';
    $items[ 'key6' ] = '';
    $items[ 'key3' ] = '';
    $items[ 'key8' ] = '';
    $items[ 'key5' ] = '';
    $items[ 'key4' ] = '';
    $items[ 'key7' ] = '';

I thought, "hey, hang on; there's a function to do easy array merges in the standard library (array_merge); surely it'd be faster/easier/more reliable to just define a (quasi-)constant array and merge that in every time through the loop?"

Fortunately, I didn't take my assumption on blind faith; I wrote a quick little bit to test the hypothesis:


$count = 1e5;
$data = array(
        'key2' => '',
        'key1' => '',
        'key6' => '',
        'key3' => '',
        'key8' => '',
        'key5' => '',
        'key4' => '',
        'key7' => '',
        );
$realdata = array();

$start = microtime( true );
for ( $loop = 0; $loop < $count; $loop++ )
{
    $realdata = array_merge( $realdata, $data );
};
$elapsed = microtime( true ) - $start;
printf( "%ld iterations with array_merge took %7.5f seconds.\n", $count, $elapsed );

$start = microtime( true );
for ( $loop = 0; $loop < $count; $loop++ )
{
    $data[ 'key2' ] = '';
    $data[ 'key1' ] = '';
    $data[ 'key6' ] = '';
    $data[ 'key3' ] = '';
    $data[ 'key8' ] = '';
    $data[ 'key5' ] = '';
    $data[ 'key4' ] = '';
    $data[ 'key7' ] = '';
};
$elapsed = microtime( true ) - $start;
printf( "%ld iterations with direct assignment took %7.5f seconds.\n", $count, $elapsed );

I ran the tests on a nearly two-year-old iMac with a 3.06 GHz Intel Core 2 Duo processor, 4 GB of RAM, OS X 10.6.4 and PHP 5.3.1 (with Zend Engine 2.3.0). Your results may vary on different kit, but I would be surprised if the basic results were significantly re-proportioned. The median times from running this test program 20 times came out as:

Assignment process Time (seconds) for 100,000 iterations
array_merge0.41995
Hand assignment0.15569

So, the "obvious," "more readable" code runs nearly three times slower than the existing, potentially error-prone during maintenance, "hand assignment." Hang on, if we used numeric indexes on our array, we could use the array_fill function instead; how would that do?

Adding the code:

    $data2 = array();
    $data2[ 0 ] = '';
    $data2[ 1 ] = '';
    $data2[ 2 ] = '';
    $data2[ 3 ] = '';
    $data2[ 4 ] = '';
    $data2[ 5 ] = '';
    $data2[ 6 ] = '';
    $data2[ 7 ] = '';
$start = microtime( true );
for ( $loop = 0; $loop < $count; $loop++ )
{
    $data2 = array_fill( 0, 8, '' );
};
$elapsed = microtime( true ) - $start;
printf( "%ld iterations with array_fill took %7.5f seconds.\n", $count, $elapsed );

produced a median time of 0.21475 seconds, or some 37.9% slower than the original hand-coding.

For folks coming from other, compiled languages, such as C, C++, Ada or what-have-you, this makes no sense whatsoever; those languages have standard libraries that are not only intended to produce efficiently-maintainable code, but (given reasonably mature libraries) efficiently-executing code as well. PHP, at least in this instance, is completely counterintuitive (read: counterproductive): if you're in a loop that will be executed an arbitrary (and arbitrarily large) number of times, as the original code was intended to be, you're encouraged to write code that invites typos, omissions and other errors creeping in during maintenance. That's a pretty damning indictment for a language that's supposedly at its fifth major revision.

If anybody knows a better way of attacking this, I'd love to read about it in the comments, by email or IM.

No comments: