Jun 28
My First YAPC
So last Friday I flew out to Pittsburgh to attend my first YAPC::NA. In fact, it was my first technical conference ever really. I met many people, drank lots of coffee, explored Carnegie Mellon, and enjoyed tons of perl presentations.
A few of the presentations made an impression on me. Rather than try and discuss each of them in a single post, I’ll post my reaction over the course of the summer. The best thing about YAPC was that it sparked my imagination. I had several good ideas, one of which I may pursue further in an academic setting.
I also took lots of pictures so I tried getting my photo gallery functioning again. Unfortunately, the permalink function in NextGen seems to have changed so my pace charts probably are non-functional. Things are looking good and I hope to post some interesting pictures.
The dinner Tuesday night was the best part. The Perl Foundation bought us dinner at Heinz field. Dinner was excellent and the desserts were even better (some sort of cookie-like pie). There was a tour of the field; I stood on the endzone and sat in Hines Ward’s and Brett Keisel’s locker. Interestingly enough, I also found out that the FBI has an office in the scoreboard and use facial recognition software to find wanted criminals in the crowd (they even send them free tickets).
I also attended the Parrot Workshop and met Jerry Gay (particle) and Jeff Horwitz and Andrew Whitworth (Whiteknight). I tried to convince Michael Schwern (schwern) to finally abandon ExtUtils::MakeMaker. I finally met chromatic and discussed how I feel infrequent releases of perl are what’s making it look stale. I had a long enjoyable conversation with Patrick Michaud (pm) at the auction dinner about school and life. I met Larry Wall (TimToady) which was much less dramatic than I imagined it would be. I had dinner with Matt Trout (mst) and it was surprisingly depressing. Lastly, I hung out with my good buddy fRew.
All in all, it was a good experience to socialize and network as well as craft a few really good ideas. YAPC::NA 11 will be in Columbus Ohio. Considering what I spent, I don’t think I want to go next year strickly because of the location unless my employer foots the bill. I really enjoyed attending and hope to attend a future event. Thanks to the event coordinators for all of their hard work and for the presenters for sharing their knowledge.
tags: perl, starstruck, yapc
2 comments
Jun 22
Awaiting YAPC Keynote
I am writing this in an auditorium on Carnegie Mellon University. This is the official start of YAPC and I am waiting for the keynote speeches.
I ate dinner with Larry Wall last night (well, sat near him). I spent the last 2 days in a Parrot workshop with a handful of parrot implementers.
I’m excited what the next few days will bring. Being social is much easier with other people who share interests. Ironically, perl mongers are very social people.
tags: parrot, perl, yapc
No comments
Jun 17
The Lost Boys
It’s no secret that many technology circles are boys’ clubs. There’s something about the industry that seems undesirable to women. For the longest time, it eluded me as to why anyone would not love technology.
Then Matt Aimonetti and Hoss Gifford proved why it’s a culture that is generally unwelcoming to women. I won’t bother linking to them or any of the stories; you’ll have to Google it yourself to find out.
These two cartoons presented at a developers’ convention and utterly degraded the entire profession in the span of an hour or less. First, Matt mad unclever and borderline offensive slides that urged developers to “perform like porn stars.” Not to be outdone, Hoss decided to put a most definitely not safe for work image directly in his slides in a recent conference, and even acted like a prepubescent boy during the presentation. Again, you’ll have to Google it for the video.
If software engineers ever hope to be taken serious by the traditional professions, such as doctors and lawyers, this kind of crap has to stop.
Much to my dismay, Kirrily Robert decided to include an image link directly in her feed that is aggregated on Planet Perl. The result is I am reading my daily dose of software related blogs and suddenly I see a resized image of something not even remotely professional in my feed reader. Google Reader dutifully downloads a dozen feed items in advance so I am not waiting to download them. No matter what I do at that point, it won’t look good.
Now I am in the uncomfortable position of explaining to my boss tomorrow why that was in my feed reader and what I intend to do about it. I am basically going to have to cut myself off of Planet Perl and try in the most professional manner to explain that I was not looking for that sort of content. Planet Perl will now join the likes of Gizmondo and XKCD as being inappropriate to read at work.
tags: inappropriate, perl
5 comments
Jun 8
Epoxy – The Glue That Holds Systems Together
I started a Perl 6 project some time ago from a few discussions during a Dallas.p6m (Perl 6 Mongers) meeting. The idea was to create a complete packaging system in Perl 6 for Perl 6 with Perl 6. That sounds confusing, so let me explain the project some more. In retrospect, this is a long post that may bore all but the most ardent reader.
First, you can find the epoxy project on github in my account and the epoxy-resin database in the perl6 account (since pmichaud gives out commit access liberally). Epoxy is the main system with the epoxy-resin project as the package database.
The idea is similar to Gentoo’s ebuild system. Instead of writing clever shell scripts and executing them in crafty ways, I decided to take another route. Perl developers clearly like doing 1 thing: writing perl code. What better way to build and release your perl code than by writing some more perl code? The key is most difficult and common tasks need to be automated and allow for a huge range of flexibility.
I chose to implement the project as a tool that uses ebuild-like files, called resin files, that provide all of the functionality necessary for a user or developer to find, build, install, and repackage a project. The goal was to fully replace Makefiles, Configure.pl scripts, ExtUtils::MakeMaker, Module::Build, Module::Install, and most importantly, CPAN and PAUSE. That sounds like a lofty goal, but it’s working out quite nicely so far.
As a sidenote, I have seen Mark Overmeer’s CPAN6 and PAUSE6. I have seen masak’s proto and dcarrera’s ppm. I have seen the PAR project and Software::Packager. I have written ebuilds and RPM spec files. I have worked with JAR packages and WAR packages. I have created JNLP files. I have worked with (but not necessarily created) packages for basically every mainstream Linux distribution. I decided that none of these solutions are interesting except for a few. I also concluded that CPAN and PAUSE as we know it need to be recycled. They’re old enough and Perl 6 is new enough that it’s time to reinvent the wheel. Rather than creating a crufty 1990’s wheel all over, let’s create a modern wheel capable of handing modern technology (for example, CPAN authors have migrated to github in droves).
A resin file is similar to the ebuild file. Rather than giving it a unique extension or artsy syntax, I decided to make the resin files classes. To provide a resin build and packager for your project, you inherit from the base resin file and override the functionality you need. The idea is you will have to implement the smallest subset of functionality required for epoxy to fetch, build, and package your project. That is, for a “hello world” project, you only need to subclass Epoxy::Resin and provide a BUILD submethod that sets your metadata.
The base class, Epoxy::Resin, provides base functions, known as targets, to do things such as fetch, build, dist (repack), install, test, clean, upgrade, and remove. Each module may either use the basic functionality or provide some custom actions for each target. A resin file declares the module’s metadata by setting public accessor values, such as author, website, and license.
There are several resin files already written, though they have to be constantly maintained as I improve the functionality of the shell and the base class. Those resin files are for masak’s projects right now since his are the most visible. I intend on writing more resin files once the current ones are smaller and some of their functionality resides in the base class.
There’s also a shell that does this work. I used Tene’s sweet dispatcher to handle the commands in the most sensible and shortest manner I could come up with. Currently, it only dispatches the shell commands out directly as targets, but I intend on grouping the targets into meta-commands so that common tasks are easier. Also, only a few targets are currently supported since I have to also write the targets for each of the resin files I currently maintain. This is a time management juggling act, so I decided what I have will suffice for now.
So once you look at it (or try it out for that matter), you’ll realize there is much to do. I won’t lie, it’s very early work. There is no test suite (I haven’t figured out the best way to integrate with rakudo’s Test.pm). There is duplicated code to save time solving a harder problem (such as the shell dispatch handlers and the build targets). Most of the targets are non-functional, such as upgrade, dist, and remove. It doesn’t currently attempt to follow the standard or even remotely care what the standard says on the matter. There is a lack of error handling, such as use-ing modules that don’t exist.
What I’m trying to say is that I understand how it is incomplete. Everything that it lacks is because I have not determined the best way to solve the problem yet. In some cases, I had to even work around known problems or a complete lack of an eco-system. The Epoxy::Fringe and Git modules are proof of that. I have the tuits and development will inch forward.
I haven’t decided on an issue tracker for the project yet. I tried to use Lighthouse but it seems unintuitive. In order to complete a milestone, you have to create tickets and mark them as complete. This is really just good for cyclic or maintenance projects but not for off-the-cuff new development. I’m currently thinking about github’s newly added Issue page, which looks like it will suffice for now.
I asked masak, a prominent member of the Perl 6 community to review my code. He had mostly positive things to say. He pointed out that he didn’t like the name of the module that builds Epoxy itself: Epoxy::Build. He told me about IO.prompt and how BUILD submethods are now supported. He also pointed out my lack of attention to the metadata attributes. These are all things on my TODO list which will be addressed eventually. He was surprised to find that I was replacing almost all of the Makefile, Makefile.in and Configure scripts to his projects.
This all started out as an exercise to learn Perl 6 better. I wanted to know the syntax and to start thinking of how things work. The best way to learn is to act. I am having fun working on this every other day or so. It’s a nice metal exercise that keeps me busy over the summer.
I have grand plans for this. I will take my time and focus on the code. I will not be distracted by a charged discussion without any working code. Concepts and ideals are great, but sometimes, code speaks for itself. I won’t argue about the future of CPAN or which way is the best to package a product. I won’t partake in the arguments because everyone seems to have an opinion but no solution. Those are arguments no one can win. Until you know that, you are useless.
I will however take feedback and constructive critisism. I don’t want to fan any flame wars or discuss what anyone thinks is the best idea for the future of Perl or Perl 6. This is my free time and the last thing I’m interested in is pointless heated discussions. I am always open to feedback, reviews, and assistance.
tags: code, cpan, epoxy, packaging, perl, perl6, portage, rakudo
No comments
Jun 3
Gentoo Herd Abandons Perl
Wow, that title sounds pretty gloomy and not entirely meant as a news header. It’s true but it was never declared.
Let’s start out by looking at the b.g.o. 206455. Check out the header information on that bug. Let me reproduce the interesting bits:
Perl 5.10.0 was released about a month ago. I attach modified ebuilds and
patches that I used to install it successfully (?) on my systemReproducible: Always
Opened: 2008-01-17 19:39 0000
Current Status: New
In case you didn’t catch that, check out the bolded text one more time. That’s right, it’s been about 18 months since perl-5.10 was released and Gentoo still does not support it. How could a source based distribution that used to pride itself on bleeding edge support possibly fall so far behind?
Simple: the herd maintainers, both of them, have basically abandoned Gentoo.
This is interested and saddening for many reasons. I’m a long time user and supporter of Gentoo and it pains me to see it fall. In my opinion, Gentoo was the only distribution to get package management correct. I loved being able to test bleeding edge software before everyone else, including Debian and RedHat.
This situation also shows 2 problems with open source projects that you would not typically exist. First, maintaining distribution supplied versions of Perl and CPAN modules is loser’s game. It’s nearly impossible to update all of those ebuilds as fast as the developers of the modules themselves. g-cpan was a terrible project that never really worked well and no one wants to take over (as you can see from some of the recent comments). What I’m taking from this is that CPAN authors themselves are the most likely candidates to keep their modules building and installing since they’re doing something very similar already.
It also points out that with any project, there needs to be some level of satisfaction. It’s apparent from the herd’s (not so) recent commits that they lacked the desire to continue. This could stem from the historically poisonous Gentoo developer community, the difficulty in maintaining the ebuilds, or real life interference.
What can we learn from this? How can we possibly improve the situation? These are difficult questions and it’s heartening to see that concerned parties are finally starting to ask them publically. I think the first thing that needs to happen is to stop asking “where is perl-5.10″. It won’t be released on Gentoo by the herd. I’ve had to live with that for 18 months and now everyone else needs to as well. What we can do is try to improve the tools we have available, fix their problems, and write new tools to fill in the gaps. We need tools to help distributors with the ever-growing nature of CPAN. No one knows what needs to happen for that yet. Time for asking “where is it?” is over; now is the time to roll up our sleeves and get to work.
perl-5.10 on Gentoo should become a case study on how open source projects can succeed and fail. Gentoo itself is a case study with developer relations, but that’s a talk for another day.
tags: code, gentoo, perl, sweat
2 comments
May 26
Benchmarking Is Hard
Peter Markholm is right, benchmarking is hard. In typical fashion, I’m going to respond to a blog post response with another blog post.
Peter Markhold took issue with the improvements I made to the benchmark of grep, first, and smart match. So I felt compelled to explain why those are improvements and should actually improve the accuracy of the results. While my choice of wording may have been strong, my apologies to the original author (I can’t believe I criticized Michael Schwern’s code!), I stand by the improvements. Keep in mind, my opinions are formed from my time benchmarking things for my graduate work and for my employer on a few occasions.
The most important thing to providing a benchmark is the only thing within the timing loop is the operation you wish to time. In this case, we didn’t want to time the operation of creating the random data (rand). We didn’t want to time the data conversion and construction (chr and .) either. Those operations are things that the system may or may not optimize; since we have little control over how the system will optimize the code, we have to assume it adds unwanted ops to the timing loop. This is why I moved the creation of the @array outside of the annonymous sub. I left the annonymous sub because I didn’t want to write my own timing loop (Benchmark.pm can’t be used otherwise).
Also, we want to avoid any memory caching optimization that the system may provide for us. While this is typically useful, we absolutely do not want the system to load the entire @array of one test into memory and reuse the preloaded memory in another test. That could give an unreliable advantage to one of the tests depending on the system’s memory optimizer and the order of the tests. The reason why it’s a good idea to give each test it’s own set of data is it then avoid data prefetching.
Lastly, the argument that we might find the $needle in the beginning of the haystack is off track. With a large enough haystack, the probability that we will find the $needle within an eyeshot is negligible. We could increase the size of the haystack to be something like 100 billion random elements or even 1 trillion elements. Sure, the sample haystack was not large enough, but the testing method was much more sound (but not perfect).
Having said that, I will agree with Peter. My results absolutely should not be taken serious. There are many more things I would do differently. I would probably write my own timing loop so I could make sure the only thing within it would be the operation I want to test. I would also repeat the test a significant enough times with a variety of randomly generated large arrays (statistically large enough). I would test with different data types, different data structures, and different versions of all software involved. I’d also make sure not to test the code in multi-user mode with anything as few applications vying for the CPU as possible (this can have misleading results).
At the end of the day, those results were just examples. The changes were improvements but did not make for a complete and irrefutable test.
tags: code, perl, response
No comments
May 25
Yet Another Splice Function
We’ve all used splice. It’s a great function that basically encapsulates push, shift, unshift, and pop to name a few. While it’s overly useful, it can have its limitations. One limitation that has always bugged me was how it required indices. Sometimes, you may not know these or want to know these indices ahead of time. That’s why I created another splice function.
In this version, which I will officially call “yasplice”, it only takes references to 2 arrays and returns an array. It removes all of the elements in the second array from the first array by value rather than index. Note this requires perl5.10 because of the extremely useful smartmatch operator (~~). This snippet also overrides the splice function so this code should be localized.
use feature ':5.10';
use subs 'splice';
sub splice {
my ($aa, $bb) = @_;
grep { $one = $_; !grep { $_ ~~ $one } @$bb } @$aa;
}
@a = (1,2,4,5,7,5,4,2,6,6,6,5,4,4,5,6,6,4,3,4,5);
@b = (4,5,6);
@c = splice(\@a, \@b);
say join ", ", @c
Prints:
1, 2, 7, 2, 3
Simple and useful. It might not be a good idea to use this on N sized arrays as the runtime would then be N2.
While looking around for other interesting perl things, I found this use.perl.org blog post about smartmatch’s performance. That blog post illustrates a problem with his testing methodology. If I were to publish a paper with such a gaping hole like that, it would never be taken serious. In each test, he’s generating a random number and a random character and storing the result! That’s not part of the test. The $needles should each be generated outside the timing loop. This is adding tons of additional instructions that are not considered part of the test. These results are basically useless and should be redone.
So I did:
use Benchmark;
use 5.010;
use List::Util qw(first);
my $needle1 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle2 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle3 = chr(64+int(rand(26))).int(rand(1000)+1);
my $needle4 = chr(64+int(rand(26))).int(rand(1000)+1);
my @array1 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array2 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array3 = map { chr(64+int(rand(26)))."$_" } 1..1000;
my @array4 = map { chr(64+int(rand(26)))."$_" } 1..1000;
timethese(100_000, {
'first' => sub {
first { $_ eq $needle1 } @array1;
},
'grep BLOCK' => sub {
grep { $_ eq $needle2 } @array2;
},
'grep EXPR' => sub {
grep $_ eq $needle3, @array3;
},
'~~' => sub {
$needle4 ~~ @array4;
}
});
With the following results:
Benchmark: timing 100000 iterations of first, grep BLOCK, grep EXPR, ~~…
first: 20 wallclock secs (19.22 usr + 0.08 sys = 19.30 CPU) @ 5181.35/s (n=100000)
grep BLOCK: 15 wallclock secs (14.12 usr + 0.06 sys = 14.18 CPU) @ 7052.19/s (n=100000)
grep EXPR: 13 wallclock secs (12.96 usr + 0.06 sys = 13.02 CPU) @ 7680.49/s (n=100000)
~~: 5 wallclock secs ( 4.61 usr + 0.02 sys = 4.63 CPU) @ 21598.27/s (n=100000)
Those results are astonishing! Smart match is 4 times faster than first, not twice as fast as the original blog poster found. Notice that I changed each test to look for a different $needle in each @array. All of the randomly created data was done outside the timing loop so the only operations being tested is the creation of the annonymous sub, the execution of the timing, and the search function.
So I’m sure you could clearly write my yasplice function to search in a much quicker manner using smart match. I’ll leave that as an exercise to the reader as this is about all I want to write for today.
tags: code, perl, splice, sub
3 comments