Unmonger

November 18th, 2009 | Category: 01100011, Meat Space, Sector 7G

I’ve often wondered what places someone in an open source community. Is it advocating the software? Openly contributing source code? Are users members of the community? I’m not sure if the definition of membership is clear to me.

You’re probably reading this entry on the Perl Ironman Blogging challenge. I joined the challenge over the summer when I had much more time to devote to active participation (no classes). I have failed to leave Paperman status since the semester started.

I haven’t contributed much. I haven’t been able to clean-up and upload my lingering perl projects to CPAN yet. I haven’t contributed to fixing bugs in CPAN modules or perl5 itself. I have flirted with contributing to Rakudo over the last 18 months but have become consumed with my graduate work as of late. I have been to only 1 YAPC::NA event, which was this summer at YAPC::NA 2009.

I don’t write gobs of professional code using Perl; 95% of my professional code is C++. The last professional Perl project was to recreate Test::Harness and TAP::Formatter to meet my needs, which turned out to vary widely. I’m not a sysadmin, so I don’t get to use Perl as a glue to hold my universe together. I don’t have 14 million repuation points for Perl on StackOverflow.com.

I don’t wax philosophically about the Great Divide between Perl 5 and Perl 6 developers; I am both, so that would be weird to argue with myself. I love Perl, both 5 and 6. I love all the great things people have contributed over the many years.

To put it bluntly, my graduate degree greatly eclipses anything I would like to contribute to the open source community, Perl included. I take a few hours a week to run and that’s about all I get for free time.

I did restart DFW.pm, now referred to as Dallas.p6m. Dallas is blessed to have a few significant community members, so I at least try to bring them together for coffee once a month. I’ve held a few mini-hackathons, though attendance has dropped, likely due to the time of the year.

So where would I fit in this Community Ball of Mud?

2 comments

Pump King

October 31st, 2009 | Category: KISS Saves Santa, Permission For Flyby

I am now a PumpKing. No, not that kind of pumpking, this kind:


pumpkingpumpking2

I stole the Perl Foundation logo and created a stencil from it using GIMP. It took me approximately 3 hours to create that pumpkin.

Happy Halloween!

3 comments

Thesis In Frustration

October 25th, 2009 | Category: 01100011, Grinds My Gears, Project Bootstrap

So this semester I have been investigating and working on my thesis. Right now, my focus is in Statistical Natural Language Processing. I don’t want to discuss the specifics of the research just yet, but it has the potential of completely up-ending the entire search industry.

I have been investigating how to build a large corpus from the web. My advisor favors using Google directly since they already exist and provide their search for free.

The first thing I did was investigate the Google SOAP API only to find out that they deprecated it when they introduced the AJAX API. The new API only allows for about 60 results with no paging. Then I looked into the REST::Google API, but that only returns 10 results. Neither of those options seem feasible. I checked Yahoo’s Yahoo::Search interface and it only seemed to return 10 results (paging, if possible, was not obvious). I could write a direct scraper but that would take a good deal of effort and I am not sure it would be worth it.

Then I even started looking at writing my own spider using WWW::Robot. This is a fairly complex module that does a ton of grunt work for you. The downside is that it behaves and follows the robots.txt protocol; that’s a problem for someone who wants to scrape everything with no regard for such a protocol.

I spent maybe about 20-30 hours flipping over this in the last 6 weeks, I finally made the effort to meet with my advisor. Since he is no longer answering email or his phone, I met him after his late class and talked it over with him while he ate dinner in the campus restaurant. We talked and waffled back and forth about our approach. In the end, we decided to investigate Lucene’s capabilities.

Frustrated and lost, I went about my week until I talked with a PhD student currently being advised by my advisor as well. Her patience for our advisor has been continually declining. She missing a publication deadline because he failed to review a paper of hers. She also divulged that she intended on switching advisors because she is not making progress. I have been contemplaing this myself, so it was good to hear that I am not the only one at their wits’ end.

I am not making progress and I am not willing to sacrifice my graduation. If I change advisors, hopefully I will find an advisor that provides much more support and direction yet gives me the option to continue developing in perl. One of the professors I want to speak with runs a programming language lab.

Maybe I can merge my interest with Perl 6 with my thesis!

2 comments

Falling Behind

October 04th, 2009 | Category: 01100011, Project Bootstrap

I dropped off the Perl Ironman blogging challenge again. This time it wasn’t due to a date miscalculation; I took a midterm exam on Thursday. My current class has taken up most of my time in the past 6 weeks. There haven’t been any chances for coding just yet. I did want to talk about something I have been looking into lately for my thesis.

First, I want to post a lazyweb question: has anyone worked with REST::Google? My first question is if anyone knows how to advance the page cursor with this module. I tried reading the code itself, but it uses Class::Accessor and Class::Data. I’m a bit unfamiliar with those modules (are they popular anymore?), and it looks like the cursor is read-only. I don’t really see the use for the module if it returns 10 results and cannot paginate.

So from that question, I took the sample and tried playing with it. This is just experimental to exercise the module’s capabilities. This is fairly boring, so I want to see if I can get some code working that can paginate through Google results using their REST API (if that’s even possible).

The author of this module probably feels very clever; he hid subpackages from CPAN by putting the package name on a new line from the package keyword. They also used __PACKAGE__->mk_ro_accessors, which looks like it will generate attribute accessors at runtime. I’m guessing CPAN cannot index that as well. What’s the point of uploading your code to a public repository if you take measures to hide it from the repository?

Anyways, I’m soliciting ideas for paginating Google’s REST API results. Note: the SOAP API has been terminated, so that avenue is closed.

2 comments

Ampersand

September 09th, 2009 | Category: 01100011, Meat Space

Today, only fRew and I ended up meeting for September Dallas.p6m meeting. Turns out several people forgot that today was in fact the Tuesday after Labor Day and not Monday. We mostly just sat and talked about miscellaneous software things. We discussed GPS, my Algorithms course, Parrot, PGE, and styling. Styling is the last thing that we discussed and one that seemed semi-heated.

I discussed my reasoning for my styling quirks, which fRew insisted he would replace in a heartbeat. I’ve mostly been honing my preference for certain things by finding bugs using styles that seem to lead towards common mistakes. For example, I always place if statement parenthesis immediately after the ‘f’ because you cannot have one without the other (well, you can have the boolean expression alone but it rarely makes sense unless it’s a return value). I tried to apply the same reasoning to why I put my braces at the end of the if line with a space between the ‘)’ and the ‘{‘. I do this because the block started by ‘{‘ can exist without if clause. I always keep the ‘{‘ on the same line so attempts to comment out the if clause will fail to compile as I’ve found and fixed too many bugs as a result of true laziness.

Now that may seem kinda wierd, but it’s 1) a mental seperation technique and 2) an attempt to reduce the number of standalone nested blocks (that can do odd things like cause variable scope issues).

Then we started discussing using ampersand, ‘&’, to begin functions. My reasoning is because I more often than not prefer to be as explicit as possible, and the ‘&’ let’s me do that. I failed to recall an example that led me to preferring the use of ampersand, but I eventually found it. Basically, functions should look like function calls and not keywords, macros, or other environment lexicals. &foo($arg1, $arg2); looks a bit hairy and dated (generally a perl4 way of doing things), but it’s clear from the first character what is about to happen. My brain needs only to parse the first character to read the code with the right mindset. I am calling a user defined function (not a built-in), named ‘foo’, and passing 2 arguments. That is clear, readable, and will likely work for the forseeable future; if not, it’s still easy to find and correct.

On the other hand, foo; or foo(); (under strict) is not necessarily clear. The first example is basically a bare-word and could be any number of things. It could be a symbol, a package, a string, or a function call. The arguments passed would be @_, which requires more investigation. The second one looks like a subroutine call but I have to parse 5 characters and then grep around for a sub named foo within the current namespace (was it loaded elsewhere and exported to my namespace?). While both of these are more compact and concise, they also both require more work to figure out exactly what is happening.

Also, foo($arg1, $arg2); is more clear but not until I’ve read the minimum of 4 characters to I start to think it might be a function call. This does not parse and skim nearly as quickly, at least to me.

All of this skimmable code talk (note: I don’t agree with Schwern, end-of-scope comments usually clutter code more than they help) may sound frivolous to those readers who deal with thousands of lines of code. It’s not something you can truely appreciate until you maintain code that weighs in with at least 9 digits (executable code only). I personally manage 250,000 lines and I am responsible for a product that is about 2 million lines (all 30 of our branches are about 2 million lines each).

In the end, I stick by my preference for ampersand function calls unless someone else can point out a better reason to ditch them.

7 comments

Semester Renewal

August 20th, 2009 | Category: 01100011, Project Bootstrap

Just after I had been through a bit of a posting slump due to some fading tuits, it would seem as if they have magically returned. This week brings the start of the Fall 2009 semester and I have seemingly sprung back to life. With the new semester comes new opportunities to use perl!

First, I wanted to mention the one bit of news that made my summer, maybe even the last 2 years, seem as if I haven’t been running in circles. I noticed that my new department head at the university posted the final new rules regarding the qualification exams. The change is that there is a list of approved courses that will be offering the QE, passing any 3 is sufficient, and there is still no marginal grade. The last exam I took, Data / Text Mining in Bioinformatics, was not listed in the blessed course list. Turns out, the department would still accept the score, so I now only lack 1 remaining pass to be an official “phd candidate“.

This semester I am taking the Design & Analysis of Computer Algorithms course. The down side is this course tends to be theory-centric, so I won’t have many chances to flex my Perl muscles. There are tons of modules on CPAN though that might help understanding the basics. There are plenty of graph, tree, and dynamic programming solutions available.

Interestingly enough, while sifting though those modules, I discovered a module of personal interest. I stumbled across the Algorithm::Viterbi module. I have studied Markov models, Markov chains, and Hidden Markov models a bunch in the last 2 years. One algorithm that keeps showing up is the Viterbi algorithm. I’ll leave it as an exercise to the reader as to how this algorithm is used, but I will point out that the Wikipedia page has Python code. Ironically, “Python’s answer to CPAN” isn’t quite all it’s cracked up to be; it lacks any packages pertaining to “viterbi” and no generic Markov package.

Perl: Automatically tested, student approved.

No comments

Perl 6 Mini-Hackathon

August 13th, 2009 | Category: 01100011, Meat Space

I have been postponing this week’s Perl post mostly because our monthly Perl 6 Mongers meeting was yesterday. I brought up and did some initial planning for a monthly Rakudo / Perl 6 mini hackathon. I have an outline of events but I want to discuss the Dallas.p6m meeting briefly first.

fRew was going to give a lightning talk about Perl 6’s object model but had to postpone due to a lack of preparation. We are all getting to know each other well and the conversations covered the spectrum: from Guido’s Simplification Of Choice to Perl 6’s attributes and then finally to processing.org’s Javascript framework.

Lastly, I brought up the idea of having monthly mini-hackathons. I thought it would be a great way to put Patrick’s talk into practice. So the initial idea is to have a 2-hour hackathon two weeks after the Dallas.p6m (second Tuesday every month), which should be the last Saturday in the month. The first get-together will be focused on Patrick’s talk, with others expanding as people find new areas of interest. That means the first mini-hackathon is slated for August 29th and here’s what I have tenatively planned:

  1. Introductions
  2. Getting and building Rakudo
  3. Running Rakudo
  4. Mailing Lists / Ecosystem
  5. Understanding RT
  6. Test Suite access and explanation
  7. Setting

We may or may not be able to cover all of that. Patrick seems confident that we will be able to cover these topics so I am fairly happy with the layout. There will be more details forthcoming about the venue and the time/date.

I’ve never been to or held an event like a mini-hackathon, so I’d love to hear suggestions.

1 comment

Next Page »