Yet Another Splice Function
We’ve all used splice. It’s a great function that basically encapsulates push, shift, unshift, and pop to name a few. While it’s overly useful, it can have its limitations. One limitation that has always bugged me was how it required indices. Sometimes, you may not know these or want to know these indices ahead of time. That’s why I created another splice function.
In this version, which I will officially call “yasplice”, it only takes references to 2 arrays and returns an array. It removes all of the elements in the second array from the first array by value rather than index. Note this requires perl5.10 because of the extremely useful smartmatch operator (~~). This snippet also overrides the splice function so this code should be localized.
use feature ':5.10';
use subs 'splice';
sub splice {
my ($aa, $bb) = @_;
grep { $one = $_; !grep { $_ ~~ $one } @$bb } @$aa;
}
@a = (1,2,4,5,7,5,4,2,6,6,6,5,4,4,5,6,6,4,3,4,5);
@b = (4,5,6);
@c = splice(\@a, \@b);
say join ", ", @c
Prints:
1, 2, 7, 2, 3
Simple and useful. It might not be a good idea to use this on N sized arrays as the runtime would then be N2.
While looking around for other interesting perl things, I found this use.perl.org blog post about smartmatch’s performance. That blog post illustrates a problem with his testing methodology. If I were to publish a paper with such a gaping hole like that, it would never be taken serious. In each test, he’s generating a random number and a random character and storing the result! That’s not part of the test. The $needles should each be generated outside the timing loop. This is adding tons of additional instructions that are not considered part of the test. These results are basically useless and should be redone.
So I did:
use Benchmark;
use 5.010;
use List::Util qw(first);
my $needle1 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle2 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle3 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle4 = chr(64+int(rand(26))).int(rand(1000)+1);
my @array1 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array2 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array3 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array4 = map { chr(64+int(rand(26)))."$_" } 1..1000;
timethese(100_000, {
'first' => sub {
first { $_ eq $needle1 } @array1;
},
'grep BLOCK' => sub {
grep { $_ eq $needle2 } @array2;
},
'grep EXPR' => sub {
grep $_ eq $needle3, @array3;
},
'~~' => sub {
$needle4 ~~ @array4;
}
});
With the following results:
Benchmark: timing 100000 iterations of first, grep BLOCK, grep EXPR, ~~…
first: 20 wallclock secs (19.22 usr + 0.08 sys = 19.30 CPU) @ 5181.35/s (n=100000)
grep BLOCK: 15 wallclock secs (14.12 usr + 0.06 sys = 14.18 CPU) @ 7052.19/s (n=100000)
grep EXPR: 13 wallclock secs (12.96 usr + 0.06 sys = 13.02 CPU) @ 7680.49/s (n=100000)
~~: 5 wallclock secs ( 4.61 usr + 0.02 sys = 4.63 CPU) @ 21598.27/s (n=100000)
Those results are astonishing! Smart match is 4 times faster than first, not twice as fast as the original blog poster found. Notice that I changed each test to look for a different $needle in each @array. All of the randomly created data was done outside the timing loop so the only operations being tested is the creation of the annonymous sub, the execution of the timing, and the search function.
So I’m sure you could clearly write my yasplice function to search in a much quicker manner using smart match. I’ll leave that as an exercise to the reader as this is about all I want to write for today.