Ruby gets a stylish HTML scraper – scrAPI
The indefatigable Assaf Arkin has done it again by developing a new Ruby HTML scraping toolkit, scrAPI. Peter Szinek recently wrote a popular article about scraping from Ruby using Manic Miner, RubyfulSoup, REXML, and WWW::Mechanize, but none of these are as immediately useful as scrAPI.. so why?
scrAPI lets you scrape from HTML using CSS selectors. For example, here's Assaf's example that defines scraper objects that can scrape auctions from eBay:
ebay_auction = Scraper.define do process "h3.ens>a", :description=>:text, :url=>"@href" process "td.ebcPr>span", :price=>:text process "div.ebPicture >a>img", :image=>"@src" result :description, :url, :price, :image end ebay = Scraper.define do array :auctions process "table.ebItemlist tr.single", :auctions => ebay_auction result :auctions end
Now that the objects are set up ready to scrape, you can put them into action like so:
auctions = ebay.scrape(html) # No. of auctions found puts auctions.size # First auction: auction = auctions puts auction.description puts auction.url
Simple example with serious power. Go get scrAPI and play.