Fast HTML parsing in Ruby with Hpricot
Ruby legend whytheluckystuff has developed a new HTML parser called Hpricot. It's easy to install and use and parses HTML in a liberal fashion. It does, however, require a compiler to install (as it's written in C), so should be okay on Linux and Mac OS X, though not necessarily on Windows (yet).
Here's some demo code:
require 'hpricot' doc = Hpricot.parse("index.html") (doc/:p/:a).each do |link| p link.attributes end
This is a good alternative to RubyfulSoup, if you're finding RubyfulSoup too slow (though RubyfulSoup is certainly worth a try!)