Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Ruby XML Performance Shootout: Nokogiri vs LibXML vs Hpricot vs REXML

By Peter Cooper / March 16, 2009

xmlresults.gifDisclaimer: Every time we've run a piece about benchmarking or performance numbers on Ruby Inside, a retraction or significant correction has come out shortly thereafter. Benchmarking is hard, ugly, and quite often wrong or biased. It is not useless, however, but if you depend on the results in any way, you should certainly try to do your own benchmarking to confirm.

Last week, libxml-ruby 1 was released - a significant achievement since it had been under development for seven years. I suspected that it might just pip Nokogiri to the "fastest way to parse XML in Ruby" post and invited people to benchmark them. Turns out.. it ain't so. Nokogiri is the fastest.

Aaron Peterson, the developer behind Nokogiri, decided to run some tests and he's published the results in a dossier called xml_truth. The benchmarking environment was Ruby 1.8.6 on OS X 10.5 with libxml2 2.7.3 installed. Hpricot 0.6.170 competes against Nokogiri 1.2.2, LibXML-Ruby 1.1.2, and the standard library's REXML.

Aaron put together a whole suite of benchmarks, but if you just want an overview, here's a chart showing the results for in memory parsing of a 14 megabyte XML document. Note that the parsing time is in seconds and the Y-axis is logarithmic. Yes, Hpricot took over a minute, and REXML took over two minutes, while Nokogiri and libxml-ruby came in at a few seconds each:

xmlresults.gif

Want all the actual numbers? Want to see the actual tests? Want to run them for yourself? Head over to Aaron's project here. Have fun! I'll putting on my flame-retardant jacket in preparation for all of the fallout about how inaccurate these tests are in 3.. 2.. 1.. :-)

Update: Hpricot 0.7 has been released and Patrick Tulskie has run some extra benchmarks. These results show that Hpricot beats libxml-ruby and Nokogiri under certain circumstances (quite significantly under an XPath test).

Update 2 (March 22, 2009): libxml-ruby's developer Charles Savage has found why libxml-ruby lagged behind Nokogiri and has resolved it.. :)

Comments

  1. Peter Cooper says:

    Just in case anyone's wondering, Y-axis is seconds ;-) Now I know why statistics was my least favorite part of math..

  2. Peter Weitzman says:

    I use hpricot to parse thousands of very small (< 1k) xml documents. I wonder what that benchmark would look like. (I am currently happy with the performance, so only a big boost would cause me to switch.)

  3. Aaron Patterson says:

    These benchmarks are not complete. I want to compare document parse times, xpath search, css search, document traversal, attribute accessing, and memory usage.

    I would also like to compare API for similar functions.

    I will post to my website when I feel it is "done".

  4. Ryan Davis says:

    I think just as important as the speed, if not more so, is the API:

    LibXML::XML::Parser.string(@xml).parse # in memory
    LibXML::XML::Parser.io(xml).parse # via file/socket/whatever

    vs:

    Nokogiri::XML(xml) # both

    No contest.

  5. flavio says:

    I tested, rexml and libxml
    the benchmarking it's true

  6. Hugo says:

    I'd like to see these tests for Ruby 1.9.1

  7. Peter Cooper says:

    If you can get Hpricot running on 1.9.1, you can run Aaron's code. Despite trying many times, I've never succeeded, although Nokogiri installs fine. Perhaps I'll install libxml-ruby on 1.9.1 now and just run the head to head tests...

  8. Peter Cooper says:

    Okay, tests are running... :) Commented out the Hpricot parts!

  9. Peter Cooper says:

    Ruby 1.9.1. Nokogiri: 1.2.2 vs LibXML: 1.1.2

    test_IO_parsing N=100
                      user     system      total        real   kBps
    null         10.250000   0.750000  11.000000 ( 10.990493) 64822.66
    nokogiri     44.870000   1.560000  46.430000 ( 46.380883) 15360.49
    libxml-ruby  53.670000   1.470000  55.140000 ( 55.097135) 12930.49

    So, nokogiri still wins. Including REXML and going to N=10:

    test_IO_parsing N=10
                      user     system      total        real   kBps
    null          0.990000   0.070000   1.060000 (  1.063771) 66972.40
    nokogiri      4.160000   0.240000   4.400000 (  4.400213) 16190.88
    libxml-ruby   5.270000   0.170000   5.440000 (  5.440546) 13094.88
    rexml       420.520000   1.530000 422.050000 (421.625631) 168.97

    So REXML is still a waste of time, even in 1.9.

  10. Michael Campbell says:

    Using a log scale with a linearly spaced axis is bad form, btw. Despite your footnote, it is counter-intuitive.

  11. Peter Cooper says:

    That's Numbers for you. It's really shitty at making graphs. Sadly the linear axis was even worse.

  12. Patrick Tulskie says:

    I think we need to modify the tests to take into consideration this little announcement here:

    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/331411

  13. Patrick Tulskie says:

    Couldn't wait for someone to address Why's complaints so I forked the tests, updated a few things, and re-ran them. Here's my fork and I've submitted a pull request to Aaron: http://github.com/PatrickTulskie/xml_truth/

    Interesting to say the least. I'm a performance nut and I love knowing what each gem is best at.

  14. Jan Wedekind says:

    I haven't used XML with Ruby so far. In the past I've used Xalan-C with C++. It supports XML parsing, XPath queries, XSD verification, and XSLT transformations. In terms of functionality most other XML libraries are not even close. But then again you don't need to rely on XML so much if you are using Ruby.

  15. Charlie says:

    Finally had a chance to look into this. libxml-ruby and nokogiri should have equivalent performance, and now do with the libxml-ruby 1.1.3 release. I posted what was causing the difference on my blog at http://cfis.savagexi.com/2009/03/21/libxml-ruby-1-1-3-boosting-performance.

  16. Kris says:

    On Ubuntu Hardy:

    sudo apt-get update
    sudo apt-get install libxslt1-dev
    sudo apt-get install libxml2-dev
    sudo gem install nokogiri

Other Posts to Enjoy

Twitter Mentions