Parse XML quickly and easily with Hpricot

In Cool, Ruby Tricks

Following on from the Parsing XML with REXML using Expat post about using Expat to make REXML faster, Chris Wanstrath e-mailed me to let me know about his co-worker PJ's post, "Parse XML with Hpricot". Hpricot, covered previously in Fast HTML parsing in Ruby with Hpricot, is a fast HTML parser for Ruby written mostly in C by Ruby legend whytheluckystiff.

PJ says that as a subset of XML, Hpricot should work fine with raw XML, and it does:

FIELDS = %w[SKU ItemName CollectionNo Pages]

doc = Hpricot.parse(File.read("my.xml"))
(doc/:product).each do |xml_product|
  product = Product.new
  for field in FIELDS
    product[field] = xml_product.search("/#{field}").first.children.first.raw_string
  end
  product.save
end

There's less hoops to jump through than with the REXML/Expat route, and it's still extremely fast. Learn more.

Vaguely Related Posts (Usually)

Comments are closed.