Yet Another Splice Function

May 25th, 2009 | Category: 01100011

We’ve all used splice. It’s a great function that basically encapsulates push, shift, unshift, and pop to name a few. While it’s overly useful, it can have its limitations. One limitation that has always bugged me was how it required indices. Sometimes, you may not know these or want to know these indices ahead of time. That’s why I created another splice function.

In this version, which I will officially call “yasplice”, it only takes references to 2 arrays and returns an array. It removes all of the elements in the second array from the first array by value rather than index. Note this requires perl5.10 because of the extremely useful smartmatch operator (~~). This snippet also overrides the splice function so this code should be localized.


use feature ':5.10';
use subs 'splice';

sub splice {
my ($aa, $bb) = @_;
grep { $one = $_; !grep { $_ ~~ $one } @$bb } @$aa;
}

@a = (1,2,4,5,7,5,4,2,6,6,6,5,4,4,5,6,6,4,3,4,5);
@b = (4,5,6);
@c = splice(\@a, \@b);
say join ", ", @c

Prints:

1, 2, 7, 2, 3

Simple and useful. It might not be a good idea to use this on N sized arrays as the runtime would then be N2.

While looking around for other interesting perl things, I found this use.perl.org blog post about smartmatch’s performance. That blog post illustrates a problem with his testing methodology. If I were to publish a paper with such a gaping hole like that, it would never be taken serious. In each test, he’s generating a random number and a random character and storing the result! That’s not part of the test. The $needles should each be generated outside the timing loop. This is adding tons of additional instructions that are not considered part of the test. These results are basically useless and should be redone.

So I did:


use Benchmark;
use 5.010;
use List::Util qw(first);

my $needle1 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle2 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle3 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle4 = chr(64+int(rand(26))).int(rand(1000)+1);
my @array1 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array2 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array3 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array4 = map { chr(64+int(rand(26)))."$_" } 1..1000;

timethese(100_000, {
'first' => sub {
first { $_ eq $needle1 } @array1;
},
'grep BLOCK' => sub {
grep { $_ eq $needle2 } @array2;
},
'grep EXPR' => sub {
grep $_ eq $needle3, @array3;
},
'~~' => sub {
$needle4 ~~ @array4;
}
});

With the following results:

Benchmark: timing 100000 iterations of first, grep BLOCK, grep EXPR, ~~…
first: 20 wallclock secs (19.22 usr + 0.08 sys = 19.30 CPU) @ 5181.35/s (n=100000)
grep BLOCK: 15 wallclock secs (14.12 usr + 0.06 sys = 14.18 CPU) @ 7052.19/s (n=100000)
grep EXPR: 13 wallclock secs (12.96 usr + 0.06 sys = 13.02 CPU) @ 7680.49/s (n=100000)
~~: 5 wallclock secs ( 4.61 usr + 0.02 sys = 4.63 CPU) @ 21598.27/s (n=100000)

Those results are astonishing! Smart match is 4 times faster than first, not twice as fast as the original blog poster found. Notice that I changed each test to look for a different $needle in each @array. All of the randomly created data was done outside the timing loop so the only operations being tested is the creation of the annonymous sub, the execution of the timing, and the search function.

So I’m sure you could clearly write my yasplice function to search in a much quicker manner using smart match. I’ll leave that as an exercise to the reader as this is about all I want to write for today.

3 comments

Dispatching Through Traits

May 18th, 2009 | Category: 01100011

In my exploration of traits, I stumbled across something that Tene created while working on the Web.pm grant. It basically uses trait_auxiliary to handle unknown sub/method traits. Here’s the actual code itself: LolDispatch.

Here’s the run-down of how this works. Be prepared, I may butcher the proper terminology or explanation since this is something that is new to me and have only started understanding it.

Line 7 defines a method called trait_auxiliary:<is>, which is what perl6 will call when an unknown sub/method trait is encountered. In this case, the trait “is” has the “http-handler” trait added to known traits. Line 8 just stashes some method information to use later.

Line 11 is the real meat of this module. It defines an exported subroutine “dispatch” that loops through the known methods and calls their postfix:() (aka operator()) method. $item<block> stashes the Closure that caused the unknown trait to be processed by trait_auxiliary:<is>.

The sample code starting in line 22 really has to go in a different file to work with Rakudo. While technically the ‘is http-handler’ trait works with methods as well, I encountered problems getting it to work.

I have an interesting use as I’m sure most people do. This basically helps remove unnecessary switch statements and dispatch tables. I simply just define my method and what the “is handler” looks like. This is really impressive, so everyone_involved++.

I wonder if there is a way (useful anyways) to combine this with a grammar.

1 comment

More Sub Confusion

May 14th, 2009 | Category: 01100011

You know something is afoot when I start blogging more regularly.

This is starting to feel like a poetry slam from the 90’s. I wanted to take a minute a respond to moritz’s response to my article lambasting the confusing nature of subroutines and methods.

So pmichaud showed up late to our Perl6 Mongers meeting and resolved the whole mess. He explained that subs are the typical subs from perl5. They have no invocant associated to them and they are scoped to the class or package they are defined in. Oh yeah, in classes and modules are just sugarized packages.

Methods on the other hand require an invocant, such as an object or self. This as supposed to eliminate the confusion about how subs worked in perl5.

The source of confusion seemed to stem from my original thought that a sub in a class was something akin to Java’s Class-level function or c++’s public static function. pmichaud explained that there really is no such thing and the only real difference with subs from perl5 is the addition of method to indicate an invocant is required. Also, “$.method” and “$_ = self” hacks don’t really work and the latter leads to seriously dangerous code.

However, pmichaud said that self will generally be required when calling a method from a method (within the same class/package/module). The second example I gave is a legitimate gripe. In fact, pmichaud said that I am definitely not the first person to voice a complaint about it.

In case you can’t recall my kvetch about this issue, the following requires the self invocants:


class A {
method foo { ... };
method bar { self.foo; };
}

I can’t just call foo without the self invocant. The problem I have is that it’s not implicitly passed to the foo function for me. Everyone’s argument is that this smells too much like the problem with subs from perl5. So let me explain myself a bit more.

To be properly Huffmanized, not specifying an invocant from a method should pass self to the method within the scoped class. From a sub, it should call the sub by the name without passing self. That is, this should work:


class A {
method foo { ... };
method bar { foo };
sub baz { ... };
sub zab { baz };
}

So then I was asked, “well how do you call something like IO::open if my class has an open?” Just like that, specify the whole package namespace.

The reason why I expect implicit invocant passing to work is because, from within the scope of a method, I should not have to continue specifing my invocant. This leeds to overly terse code that is not properly Huffmanized like this:


class A {
method s {...};
method t {...};
method u {...};
method v {...};
method w {...};
method x {...};
method y {...};
method z {
self.s;
self.t;
self.u;
self.v;
self.w;
self.x;
self.y;
};

where method z should look more like


method z { s; t; u; v; w; x; y; }

I can work around this by doing clever given (read: Basic/VB ‘With’ blocks) statements:


method z { given self { .?s; .?t; .?u; .?v; .?w; .?x; .?y; }; }

It’s close, but no cigar. Of coarse, I could always be wrong and I’m just looking at the problem from the wrong angle. I’d love to hear any feedback.

3 comments