Want to stay on top? Ruby Weekly is a once-weekly e-mail newsletter covering the latest Ruby and Rails news.
     Feed Icon

Using Amazon’s Web Services for Spidering the Web

By Peter Cooper / February 13, 2008

aws.png

Robert Dempsey has written a code-packed article for Amazon Web Services' "Developer Connection" site called Using Amazon S3, EC2, SQS, Lucene and Ruby for Web Spidering. It's a bit of an epic and covers using a multitude of Amazon Web Services together (namely the S3 storage system, the EC2 "Elastic Compute Cloud", and the Simple Queue Service), with Ruby acting as the glue that holds them all together. This could be of great interest to anyone who wants to put together large-scale crawlers using on-demand hardware and services.

As an aside, I'm interested in all interesting Ruby-related Amazon / S3 / EC2 articles and links for a future "list post," so if you have any recommendations, leave a comment. Thanks!

Comments

  1. John says:

    There's also the "rufus-sqs" gem that leverages Amazon's SQS [REST interface] :

    gem install -y rufus-sqs
    http://rufus.rubyforge.org/rufus-sqs/

  2. Jason says:

    A new plugin written by my coworker for managing rails on ec2 / s3. Works pretty well.

    http://rubyforge.org/projects/rubber/

  3. Markus says:

    Some links you might want to include in your list post:
    http://rubyworks-ec2.rubyforge.org/
    http://ec2onrails.rubyforge.org/

  4. Thorsten says:

    Not an article, but we recently released version 1.5 of the RightAws gem which provides Ruby interfaces for EC2, S3, SQS, and now also SimpleDB. Persistent connections, support for >2GB objects, error retries, XML parsing with libxml, and more goodies. See http://rubyforge.org/projects/rightaws/

    We also have a good number of EC2 related articles on our blog: http://info.rightscale.com/blog

Other Posts to Enjoy

Twitter Mentions