Perl FAQ 5.10: How come my converted awk/sed/sh script runs more slowly in Perl?

Perl FAQ 5.10

How come my converted awk/sed/sh script runs more slowly in Perl?

The natural way to program in those languages may not make for the fastest Perl code. Notably, the awk-to-perl translator produces sub-optimal code; see the a2p man page for tweaks you can make.

Two of Perl's strongest points are its associative arrays and its regular expressions. They can dramatically speed up your code when applied properly. Recasting your code to use them can help a lot.

How complex are your regexps? Deeply nested sub-expressions with {n,m} or *operators can take a very long time to compute. Don't use ()'s unless you really need them. Anchor your string to the front if you can.

Something like this: next unless /^.*%.*$/; runs more slowly than the equivalent: next unless /%/;

Note that this:

    next if /Mon/;
    next if /Tue/;
    next if /Wed/;
    next if /Thu/;
    next if /Fri/;

runs faster than this:

    next if /Mon/ || /Tue/ || /Wed/ || /Thu/ || /Fri/;

which in turn runs faster than this:

    next if /Mon|Tue|Wed|Thu|Fri/;

which runs much faster than:

    next if /(Mon|Tue|Wed|Thu|Fri)/;

There's no need to use /^.*foo.*$/ when /foo/ will do.

Remember that a printf costs more than a simple print.

Don't split() every line if you don't have to.

Another thing to look at is your loops. Are you iterating through indexed arrays rather than just putting everything into a hashed array? For example,

    @list = ('abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stv');

    for $i ($[ .. $#list) {
        if ($pattern eq $list[$i]) { $found++; } 
    }

First of all, it would be faster to use Perl's foreach mechanism instead of using subscripts:

    foreach $elt (@list) {
        if ($pattern eq $elt) { $found++; } 
    }

Better yet, this could be sped up dramatically by placing the whole thing in an associative array like this:

    %list = ('abc', 1, 'def', 1, 'ghi', 1, 'jkl', 1, 
             'mno', 1, 'pqr', 1, 'stv', 1 );
    $found += $list{$pattern};
    
    (but put the %list assignment outside of your input loop.)

You should also look at variables in regular expressions, which is expensive. If the variable to be interpolated doesn't change over the life of the process, use the /o modifier to tell Perl to compile the regexp only once, like this:

    for $i (1..100) {
        if (/$foo/o) {
            &some_func($i);
        } 
    }

Finally, if you have a bunch of patterns in a list that you'd like to compare against, instead of doing this:

    @pats = ('_get.*', 'bogus', '_read', '.*exit', '_write');
    foreach $pat (@pats) {
        if ( $name =~ /^$pat$/ ) {
            &some_func();
            last;
        }
    }

If you build your code and then eval it, it will be much faster. For example:

    @pats = ('_get.*', 'bogus', '_read', '.*exit', '_write');
    $code = <<EOS
            while (<>) { 
                study;
EOS
    foreach $pat (@pats) {
        $code .= <<EOS
            if ( /^$pat\$/ ) {
                &some_func();
                next;
            }
EOS
    }
    $code .= "}\n";
    print $code if $debugging;
    eval $code;

Other resources at this site: