Transcript
LiesDamn Lies& Benchmarks
Steven LembarkWorkhorse Computinglembark@wrkhors.com
“Perl is too slow”
Heard that before? Yeah...
Mostly wrong – can't refute it without data.
Need to benchmark the times.
Damn lies...
Good benchmarks find realistic times.
Most benchmarks prove a point.
They get ignored.
Ignored results are not lazy.
Benchmarking perl
The *NIX “time” command.
Good enough to answer most questions.
Avoids much Benchmarking Stuff (“BS”).
Simplest tool: “time”
real, system, and user times.
real time heavily affected by system load.
system + user better indication of “work”.
real – work = blocked.
“bash takes less time to start up”
perl isn't any slower:
Zero work for both.
Real is all blocked.
$ time perl -e 0
real 0m0.005suser 0m0.000ssys 0m0.000s
$ time bash /dev/null
real 0m0.005suser 0m0.000ssys 0m0.000s
BS: Startup Times
If something just ran it is probably in core.
Saves overhead running it the second time.
Run everything twice to benchmark startups.
Multiple runs or single-user manage background noise.
Minimizing startup issues
Save kernel calls, context switches, interrupts, latency, transfer I/O...
tmpfs on linux minimizes overhead.
Test with un-loaded system.
Avoid “virtual” systems (CPU, EMC) unless that is what you are testing.
What does startup time tell us?
Opterons are fast?
Useless by itself.
Necessary baseline.
Differences are a warning.
Analyzing startup times.
Big differences usually indicate a problem:
Mis-compiled: “-O0” “-g” on production code.
Mixing 32- and 64-bit code and O/S.
Background noise from other running jobs.
Botched startups leave everything else suspect.
Do something!
OK, let's time an operation.
Listing a directory is common enough.
“ls” lists the contents, sorts lexically.
Perl's “glob” is similar.
Trivial persuit: ls vs glob.
lembark@dizzy etl $ time bash -c '/bin/ls -d /tmp/*'
real 0m0.007suser 0m0.000ssys 0m0.000s
lembark@dizzy etl $ time perl -e '$\="\n"; $,=" "; print glob "/tmp/*"'
real 0m0.019suser 0m0.010ssys 0m0.000s
Mostly blocked: 7ms bash vs. 9ms perl.
Failing to clear the screen can skew results!
Remote display, virtual machines.
BS: Milliseconds matter
Really care about 12ms? OK, perl is slower.
Most of the difference is in blocked time.
Hint: perl and shell block at the same rate.
perl compiles a statement, which adds overhead.
Use “ls” for what it is.
Doing more
Search files using their basenames:
Find all of the basenames from “2012.05.05” through “2012.05.16”.
First step: How many files are there?
Times
Compare File::Find with /bin/find.
Roughly same system time, added user for compile.
Shell is faster because it is single-purpose.$ time find . -type f | wc -l;18583
real 0m0.080suser 0m0.020ssys 0m0.050s
$ time perl -MFile::Find -e 'my $i = 0; find sub { -l or -d or ++$i },"."; print $i, "\n"'18583
real 0m0.274suser 0m0.220ssys 0m0.050s
Multi-layer pipesCompare the basename to a regex.
Shell:
find . -type f | xargs -l1 basename |
egrep -E '2012.05.(?:0[5-9]|1[0-6])'
Find files, extract basenames, and search with extended syntax (largely borrowed from Perl).
One-liner with perl, File::Find & File::Basename.
BS: Forks & pipes are “free”.
Real, user, and system time are higher for bash.
xargs has to fork/exec many copies of basename.
system overhead from buffering pipes is also higher.
Plumbing is expensive!$ time find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' | wc -l1604
real 0m29.823suser 0m0.710ssys 0m4.220s
$ time perl -MFile::Find=find -MFile::Basename=basename -e 'my $i=0; find sub { -l || -d and return;/2012.05.(?:0[5-9]|1[0-6])/ and ++$i }, "."; print $i, "\n"'1604
real 0m0.301suser 0m0.170ssys 0m0.130s
Replacing content “in place”
perl's “-i” replaces files in place.
Shell pre-opens files, can't “sort -d < a > a”.
Shell requires “sort -d < a > b && mv b a”.
Now imagine filtering a few thousand files...
perl -n & -p with -i
Say you have to update the package names for a few hundred modules from “::Source” to “::RDS”.
Mixing shell with perl:
find . -type f | xargs perl -i -p -e's/::Source\b/::RDS/g';
Exercise: Try writing this in pure shell.
Running it doesn't take long eitherNice division of labor:
find & xargs deal with the names.
perl deals with the regex.
not much typing either way.
not much time either.$ time find . -type f | xargs perl -i -p -e 's/::Source\b/::RDS/g'
real 0m0.112suser 0m0.044ssys 0m0.016s
What this means to you.
Plumbing and forks are not free.
Single-purpose programs faster for one thing.
Chaining the simpler tools adds overhead.
Languages faster for multi-stage tasks.
top related