Post by Peter Cooper on June 15th, 2006
Screen scraping with Ruby

- Ruby gets a stylish HTML scraper - scrAPI
- How To Scrape Google With Ruby In 0 Seconds
- Feedalizer: Easy Webpage to RSS Conversion from Ruby


Peter Szinek has announced he's going to write a series of articles on 'screen scraping' with Ruby (more accurately, extracting data from Web pages and other online sources) and has released the first article entitled "Data Extraction for Web 2.0: Screen scraping in Ruby/Rails". He covers four basic scraping techniques, first using regular expressions, then HTree and REXML, then RubyfulSoup, and finally WWW::Mechanize. If you need to process shaky HTML sources from Ruby, read on.
The font used on his blog may seem a little painful to some readers, but he covers these main libraries and techniques pretty well.

Click here to add on del.icio.us
Tweet This









June 15th, 2006 at 3:09 pm
Great timing! The New Haven Ruby Brigade just talked about RubyfulSoup and Mechanize at our monthly meeting last night.