Post by Peter Cooper on February 13th, 2008
Using Amazon’s Web Services for Spidering the Web

- What’s Hot on Github - November 2008
- Amazon Web Services Introduction for Ruby Coders
- How to Build a Rails Engine From Scratch



Robert Dempsey has written a code-packed article for Amazon Web Services' "Developer Connection" site called Using Amazon S3, EC2, SQS, Lucene and Ruby for Web Spidering. It's a bit of an epic and covers using a multitude of Amazon Web Services together (namely the S3 storage system, the EC2 "Elastic Compute Cloud", and the Simple Queue Service), with Ruby acting as the glue that holds them all together. This could be of great interest to anyone who wants to put together large-scale crawlers using on-demand hardware and services.
As an aside, I'm interested in all interesting Ruby-related Amazon / S3 / EC2 articles and links for a future "list post," so if you have any recommendations, leave a comment. Thanks!

Click here to add on del.icio.us
Tweet This









February 13th, 2008 at 1:35 pm
There's also the "rufus-sqs" gem that leverages Amazon's SQS [REST interface] :
gem install -y rufus-sqs
http://rufus.rubyforge.org/rufus-sqs/
February 14th, 2008 at 3:05 am
A new plugin written by my coworker for managing rails on ec2 / s3. Works pretty well.
http://rubyforge.org/projects/rubber/
February 14th, 2008 at 7:29 am
Some links you might want to include in your list post:
http://rubyworks-ec2.rubyforge.org/
http://ec2onrails.rubyforge.org/
February 14th, 2008 at 7:51 am
Not an article, but we recently released version 1.5 of the RightAws gem which provides Ruby interfaces for EC2, S3, SQS, and now also SimpleDB. Persistent connections, support for >2GB objects, error retries, XML parsing with libxml, and more goodies. See http://rubyforge.org/projects/rightaws/
We also have a good number of EC2 related articles on our blog: http://info.rightscale.com/blog