Pump King

October 31st, 2009 | Category: KISS Saves Santa, Permission For Flyby

I am now a PumpKing. No, not that kind of pumpking, this kind:


pumpkingpumpking2

I stole the Perl Foundation logo and created a stencil from it using GIMP. It took me approximately 3 hours to create that pumpkin.

Happy Halloween!

3 comments

Thesis In Frustration

October 25th, 2009 | Category: 01100011, Grinds My Gears, Project Bootstrap

So this semester I have been investigating and working on my thesis. Right now, my focus is in Statistical Natural Language Processing. I don’t want to discuss the specifics of the research just yet, but it has the potential of completely up-ending the entire search industry.

I have been investigating how to build a large corpus from the web. My advisor favors using Google directly since they already exist and provide their search for free.

The first thing I did was investigate the Google SOAP API only to find out that they deprecated it when they introduced the AJAX API. The new API only allows for about 60 results with no paging. Then I looked into the REST::Google API, but that only returns 10 results. Neither of those options seem feasible. I checked Yahoo’s Yahoo::Search interface and it only seemed to return 10 results (paging, if possible, was not obvious). I could write a direct scraper but that would take a good deal of effort and I am not sure it would be worth it.

Then I even started looking at writing my own spider using WWW::Robot. This is a fairly complex module that does a ton of grunt work for you. The downside is that it behaves and follows the robots.txt protocol; that’s a problem for someone who wants to scrape everything with no regard for such a protocol.

I spent maybe about 20-30 hours flipping over this in the last 6 weeks, I finally made the effort to meet with my advisor. Since he is no longer answering email or his phone, I met him after his late class and talked it over with him while he ate dinner in the campus restaurant. We talked and waffled back and forth about our approach. In the end, we decided to investigate Lucene’s capabilities.

Frustrated and lost, I went about my week until I talked with a PhD student currently being advised by my advisor as well. Her patience for our advisor has been continually declining. She missing a publication deadline because he failed to review a paper of hers. She also divulged that she intended on switching advisors because she is not making progress. I have been contemplaing this myself, so it was good to hear that I am not the only one at their wits’ end.

I am not making progress and I am not willing to sacrifice my graduation. If I change advisors, hopefully I will find an advisor that provides much more support and direction yet gives me the option to continue developing in perl. One of the professors I want to speak with runs a programming language lab.

Maybe I can merge my interest with Perl 6 with my thesis!

2 comments

Falling Behind

October 04th, 2009 | Category: 01100011, Project Bootstrap

I dropped off the Perl Ironman blogging challenge again. This time it wasn’t due to a date miscalculation; I took a midterm exam on Thursday. My current class has taken up most of my time in the past 6 weeks. There haven’t been any chances for coding just yet. I did want to talk about something I have been looking into lately for my thesis.

First, I want to post a lazyweb question: has anyone worked with REST::Google? My first question is if anyone knows how to advance the page cursor with this module. I tried reading the code itself, but it uses Class::Accessor and Class::Data. I’m a bit unfamiliar with those modules (are they popular anymore?), and it looks like the cursor is read-only. I don’t really see the use for the module if it returns 10 results and cannot paginate.

So from that question, I took the sample and tried playing with it. This is just experimental to exercise the module’s capabilities. This is fairly boring, so I want to see if I can get some code working that can paginate through Google results using their REST API (if that’s even possible).

The author of this module probably feels very clever; he hid subpackages from CPAN by putting the package name on a new line from the package keyword. They also used __PACKAGE__->mk_ro_accessors, which looks like it will generate attribute accessors at runtime. I’m guessing CPAN cannot index that as well. What’s the point of uploading your code to a public repository if you take measures to hide it from the repository?

Anyways, I’m soliciting ideas for paginating Google’s REST API results. Note: the SOAP API has been terminated, so that avenue is closed.

2 comments

Ampersand

September 09th, 2009 | Category: 01100011, Meat Space

Today, only fRew and I ended up meeting for September Dallas.p6m meeting. Turns out several people forgot that today was in fact the Tuesday after Labor Day and not Monday. We mostly just sat and talked about miscellaneous software things. We discussed GPS, my Algorithms course, Parrot, PGE, and styling. Styling is the last thing that we discussed and one that seemed semi-heated.

I discussed my reasoning for my styling quirks, which fRew insisted he would replace in a heartbeat. I’ve mostly been honing my preference for certain things by finding bugs using styles that seem to lead towards common mistakes. For example, I always place if statement parenthesis immediately after the ‘f’ because you cannot have one without the other (well, you can have the boolean expression alone but it rarely makes sense unless it’s a return value). I tried to apply the same reasoning to why I put my braces at the end of the if line with a space between the ‘)’ and the ‘{‘. I do this because the block started by ‘{‘ can exist without if clause. I always keep the ‘{‘ on the same line so attempts to comment out the if clause will fail to compile as I’ve found and fixed too many bugs as a result of true laziness.

Now that may seem kinda wierd, but it’s 1) a mental seperation technique and 2) an attempt to reduce the number of standalone nested blocks (that can do odd things like cause variable scope issues).

Then we started discussing using ampersand, ‘&’, to begin functions. My reasoning is because I more often than not prefer to be as explicit as possible, and the ‘&’ let’s me do that. I failed to recall an example that led me to preferring the use of ampersand, but I eventually found it. Basically, functions should look like function calls and not keywords, macros, or other environment lexicals. &foo($arg1, $arg2); looks a bit hairy and dated (generally a perl4 way of doing things), but it’s clear from the first character what is about to happen. My brain needs only to parse the first character to read the code with the right mindset. I am calling a user defined function (not a built-in), named ‘foo’, and passing 2 arguments. That is clear, readable, and will likely work for the forseeable future; if not, it’s still easy to find and correct.

On the other hand, foo; or foo(); (under strict) is not necessarily clear. The first example is basically a bare-word and could be any number of things. It could be a symbol, a package, a string, or a function call. The arguments passed would be @_, which requires more investigation. The second one looks like a subroutine call but I have to parse 5 characters and then grep around for a sub named foo within the current namespace (was it loaded elsewhere and exported to my namespace?). While both of these are more compact and concise, they also both require more work to figure out exactly what is happening.

Also, foo($arg1, $arg2); is more clear but not until I’ve read the minimum of 4 characters to I start to think it might be a function call. This does not parse and skim nearly as quickly, at least to me.

All of this skimmable code talk (note: I don’t agree with Schwern, end-of-scope comments usually clutter code more than they help) may sound frivolous to those readers who deal with thousands of lines of code. It’s not something you can truely appreciate until you maintain code that weighs in with at least 9 digits (executable code only). I personally manage 250,000 lines and I am responsible for a product that is about 2 million lines (all 30 of our branches are about 2 million lines each).

In the end, I stick by my preference for ampersand function calls unless someone else can point out a better reason to ditch them.

6 comments

Semester Renewal

August 20th, 2009 | Category: 01100011, Project Bootstrap

Just after I had been through a bit of a posting slump due to some fading tuits, it would seem as if they have magically returned. This week brings the start of the Fall 2009 semester and I have seemingly sprung back to life. With the new semester comes new opportunities to use perl!

First, I wanted to mention the one bit of news that made my summer, maybe even the last 2 years, seem as if I haven’t been running in circles. I noticed that my new department head at the university posted the final new rules regarding the qualification exams. The change is that there is a list of approved courses that will be offering the QE, passing any 3 is sufficient, and there is still no marginal grade. The last exam I took, Data / Text Mining in Bioinformatics, was not listed in the blessed course list. Turns out, the department would still accept the score, so I now only lack 1 remaining pass to be an official “phd candidate“.

This semester I am taking the Design & Analysis of Computer Algorithms course. The down side is this course tends to be theory-centric, so I won’t have many chances to flex my Perl muscles. There are tons of modules on CPAN though that might help understanding the basics. There are plenty of graph, tree, and dynamic programming solutions available.

Interestingly enough, while sifting though those modules, I discovered a module of personal interest. I stumbled across the Algorithm::Viterbi module. I have studied Markov models, Markov chains, and Hidden Markov models a bunch in the last 2 years. One algorithm that keeps showing up is the Viterbi algorithm. I’ll leave it as an exercise to the reader as to how this algorithm is used, but I will point out that the Wikipedia page has Python code. Ironically, “Python’s answer to CPAN” isn’t quite all it’s cracked up to be; it lacks any packages pertaining to “viterbi” and no generic Markov package.

Perl: Automatically tested, student approved.

No comments

Perl 6 Mini-Hackathon

August 13th, 2009 | Category: 01100011, Meat Space

I have been postponing this week’s Perl post mostly because our monthly Perl 6 Mongers meeting was yesterday. I brought up and did some initial planning for a monthly Rakudo / Perl 6 mini hackathon. I have an outline of events but I want to discuss the Dallas.p6m meeting briefly first.

fRew was going to give a lightning talk about Perl 6’s object model but had to postpone due to a lack of preparation. We are all getting to know each other well and the conversations covered the spectrum: from Guido’s Simplification Of Choice to Perl 6’s attributes and then finally to processing.org’s Javascript framework.

Lastly, I brought up the idea of having monthly mini-hackathons. I thought it would be a great way to put Patrick’s talk into practice. So the initial idea is to have a 2-hour hackathon two weeks after the Dallas.p6m (second Tuesday every month), which should be the last Saturday in the month. The first get-together will be focused on Patrick’s talk, with others expanding as people find new areas of interest. That means the first mini-hackathon is slated for August 29th and here’s what I have tenatively planned:

  1. Introductions
  2. Getting and building Rakudo
  3. Running Rakudo
  4. Mailing Lists / Ecosystem
  5. Understanding RT
  6. Test Suite access and explanation
  7. Setting

We may or may not be able to cover all of that. Patrick seems confident that we will be able to cover these topics so I am fairly happy with the layout. There will be more details forthcoming about the venue and the time/date.

I’ve never been to or held an event like a mini-hackathon, so I’d love to hear suggestions.

1 comment

Corehackers: What I Want

August 02nd, 2009 | Category: 01100011, Grinds My Gears

Earlier this week, I was hanging out in the #corehackers channel and I decided to voice my opinion on the flood of discussions concerning the problems plaguing Perl 5. David Golden challenged me to respond to his post concerning what I want from Perl 5 moving forward.

First, my suggestion to chromatic was that new features should receive first class citizenship over deprecations. Since I prescribe to the technology-over-politics philosophy, I think deprecated features should be removed after 2 releases. Once removed, they should be provided within a library. I don’t feel those aging applications depending on these features need to be intentionally broken, but at some point, features that have been deprecated for 10+ years need to be removed. Keeping them only makes the deprecation problem harder every release they remain.

So what would I like to see out of Perl 5 in the future? Perl 5. It’s simple, I just want to see Perl 5 continue to thrive until there is a complete Perl 6 interpreter (it’s been 9 years, may very well be another 9). Frequent releases have the effect of convincing people that there is an active developer community. When application developers feel there is an active community, they are more likely to consider the platform viable. Application developers do not seek dead platforms for The Next Big Thing.

There are many things Perl 5 pumpkings and developers can do to achieve this. Frequent releases are a must; it doesn’t matter if 1 or 1000 tickets were closed since the last release, getting it out the door is the most important thing. Worrying about 100% backwards compatibility in every release means releases don’t happen. Just look at 5.10.1. 18 months for a minor version upgrade? How many pumpkings has 5.10 had anyways? It’s time to move on and to stop worrying about breaking all of DarkPAN.

Features that extend the expressiveness and flexibility of Perl 5 need the highest priority. The grammars in Perl 6 mean that Perl 6 can eventually host Perl 7. That feature alone will keep Perl 6 alive years beyond any other language.

In the end, bickering about release schedules, deprecation policies, dependency problems, or any other issue is not going to get Perl out the door. At the end of the day, there are technical and social issues to work through. Social issues are hard; people don’t change their minds easily. Let’s just agree to solve the technical issues, which are significantly easier. Scheduling releases, fixing segfaults, and updating modules are all just technical problems that all have solutions. They may not be pretty or quick to fix but solutions exist, and in most cases, we already know what’s needed but are too afraid to move forward. Sure, breaking 10 year old Perl 5 scripts will be painful but no where near as painful as watching Perl 5 die a slow and miserable death where everyone involved bad-mouths it until it actually is dead.

2 comments

Next Page »