Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Fast HTML parsing in Ruby with Hpricot

By Peter Cooper / July 5, 2006

Hpricot

Ruby legend whytheluckystuff has developed a new HTML parser called Hpricot. It's easy to install and use and parses HTML in a liberal fashion. It does, however, require a compiler to install (as it's written in C), so should be okay on Linux and Mac OS X, though not necessarily on Windows (yet).

Here's some demo code:

require 'hpricot'
doc = Hpricot.parse("index.html")
(doc/:p/:a).each do |link|
  p link.attributes
end

This is a good alternative to RubyfulSoup, if you're finding RubyfulSoup too slow (though RubyfulSoup is certainly worth a try!)

Comments

  1. RSL says:

    Does Hpricot install under Cygwin? I'll probably find this out for myself in a few minutes/hours but maybe someone else would want to know as well so the question might help.

Other Posts to Enjoy

Twitter Mentions