Oct 4

Falling Behind

Author: s1n
Category: 01100011, Project Bootstrap

I dropped off the Perl Ironman blogging challenge again. This time it wasn’t due to a date miscalculation; I took a midterm exam on Thursday. My current class has taken up most of my time in the past 6 weeks. There haven’t been any chances for coding just yet. I did want to talk about something I have been looking into lately for my thesis.

First, I want to post a lazyweb question: has anyone worked with REST::Google? My first question is if anyone knows how to advance the page cursor with this module. I tried reading the code itself, but it uses Class::Accessor and Class::Data. I’m a bit unfamiliar with those modules (are they popular anymore?), and it looks like the cursor is read-only. I don’t really see the use for the module if it returns 10 results and cannot paginate.

So from that question, I took the sample and tried playing with it. This is just experimental to exercise the module’s capabilities. This is fairly boring, so I want to see if I can get some code working that can paginate through Google results using their REST API (if that’s even possible).

The author of this module probably feels very clever; he hid subpackages from CPAN by putting the package name on a new line from the package keyword. They also used __PACKAGE__->mk_ro_accessors, which looks like it will generate attribute accessors at runtime. I’m guessing CPAN cannot index that as well. What’s the point of uploading your code to a public repository if you take measures to hide it from the repository?

Anyways, I’m soliciting ideas for paginating Google’s REST API results. Note: the SOAP API has been terminated, so that avenue is closed.


tags: ,
2 comments

2 Comments so far

  1. Stefan Petrea October 5th, 2009 6:44 am

    Yeah , the REST interface of Google doesn’t offer anything but the first 40 results so it’s not very useful.
    You can try to scrape them of scroogle…or maybe use another module out there. But you have to be careful how you scrape becuause Google will ban you for 24h if it finds out you’re a robot.
    (keep the same referer, your Agent string ..)

    Best regards,
    Stefan

  2. s1n October 7th, 2009 2:01 am

    That’s what I was trying to find out; are there any other modules out there that still work and let me paginate through the results?

    I don’t intend on scraping in the traditional sense, so I’m not overly worried about getting banned.

    Has anyone tried Google::Search?

Leave a comment