Screen scraping with Ruby

By Peter Cooper / June 15, 2006

Peter Szinek has announced he's going to write a series of articles on 'screen scraping' with Ruby (more accurately, extracting data from Web pages and other online sources) and has released the first article entitled "Data Extraction for Web 2.0: Screen scraping in Ruby/Rails". He covers four basic scraping techniques, first using regular expressions, then HTree and REXML, then RubyfulSoup, and finally WWW::Mechanize. If you need to process shaky HTML sources from Ruby, read on.

The font used on his blog may seem a little painful to some readers, but he covers these main libraries and techniques pretty well.

Comments

Josh Warchol says:
June 15, 2006 at 3:09 pm
Great timing! The New Haven Ruby Brigade just talked about RubyfulSoup and Mechanize at our monthly meeting last night.