Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Parse XML quickly and easily with Hpricot

By Peter Cooper / August 1, 2006

Following on from the Parsing XML with REXML using Expat post about using Expat to make REXML faster, Chris Wanstrath e-mailed me to let me know about his co-worker PJ's post, "Parse XML with Hpricot". Hpricot, covered previously in Fast HTML parsing in Ruby with Hpricot, is a fast HTML parser for Ruby written mostly in C by Ruby legend whytheluckystiff.

PJ says that as a subset of XML, Hpricot should work fine with raw XML, and it does:

FIELDS = %w[SKU ItemName CollectionNo Pages]

doc = Hpricot.parse(File.read("my.xml"))
(doc/:product).each do |xml_product|
  product = Product.new
  for field in FIELDS
    product[field] = xml_product.search("/#{field}").first.children.first.raw_string
  end
  product.save
end

There's less hoops to jump through than with the REXML/Expat route, and it's still extremely fast. Learn more.

Other Posts to Enjoy

Twitter Mentions