Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

libxml-ruby 0.8.0 Released: Ruby Gets Fast, Reliable XML Processing At Last

By Peter Cooper / July 17, 2008

Picture 1.png

Ruby's is not known for its deftness with XML. On RubyFlow, I considered calling the community to arms over it, and solicited twenty responses on what the problem is, and what we could do about it. Robert Fischer was lamenting on the state of Ruby's libxml library, and didn't seem to like REXML much either. Tim Bray has also had a few complaints about REXML. It seemed there was a problem to be fixed; a gap in the market, as it were, for a decent XML parser for Ruby. Hpricot, despite really being an HTML parser, would have to get us by in the meantime.

Today, however, libxml-ruby 0.8.0 has been released, and Charlie Savage explains why this is such a big deal. libxml-ruby now runs on Windows (thanks to Charlie), doesn't segfault all the time, and the bindings have all been fixed over the past year (thanks to Dan Janowski). You can get going with it right now with a simple gem install libxml-ruby

libxml-ruby is known for its performance, the latest release doesn't disappoint. For a range of simple tasks, libxml clocks in at ten times quicker than Hpricot like-for-like and between 30 and 60 times faster than REXML. Charles adds:

In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:

  • SAX
  • DOM
  • XMLReader (streaming interface)
  • XPath
  • XPointer
  • XML Schema
  • DTDs
  • XSLT (split into the libxslt-ruby bindings)

Charles is planning to write a proper tutorial in the next week, covering some of the key features, but suggests referring to the API documentation in the meantime. The test suite (located in the test directory that comes with libxml-ruby) also looks like a great resource for code examples; very clean and straightforward. If you have any libxml-ruby tutorials or resources of your own, please post them in the comments here.

Congratulations to all of those involved in libxml-ruby's long history and especially to Charlie Savage for giving it the finish push to this mature state. Ruby's XML woes are tempered, for now at least.

Comments

  1. Don says:

    You might want to hold off on the upgrade if you use aws-s3 gem, they don't play nice together.

  2. Charlie says:

    Hi Don,

    Hmm, thought we had got that fixed. If you get a chance can you post a bug on RubyForge (http://rubyforge.org/tracker/?atid=1971&group_id=494&func=browse)?

    I'll try and see what's up with aws-s3, but since we're not using it, not sure how easy it will be to test.

  3. Charlie says:

    I've verified the aws-s3 error and fixed the issue. Kudos to the aws-s3 team for inluding a nice test suite complete with mock objects that made it easy to track down.

    Fix is included in libxml 0.8.1 which was just pushed to RubyForge.

    Thanks for the report Don.

  4. Dieter says:

    the API documentation link is broken

  5. Emm says:

    The link to the API website is broken (it says http://.). Otherwise, thanks for the posting, very interesting.

  6. subbu says:

    I am finding it difficult to install libxml on my Ubuntu. However I was able to install it on my CentOS. Is there a forum to ask this question further? I couldn't find one.

    ----------this is what my console says------------------
    sudo gem install libxml-ruby
    Building native extensions. This could take a while...
    ERROR: Error installing libxml-ruby:
    ERROR: Failed to build gem native extension.

    /usr/bin/ruby1.8 extconf.rb install libxml-ruby
    checking for socket() in -lsocket... no
    checking for gethostbyname() in -lnsl... yes
    checking for atan() in -lm... no
    checking for atan() in -lm... yes
    checking for inflate() in -lz... yes
    checking for iconv_open() in -liconv... no
    checking for libiconv_open() in -liconv... no
    checking for libiconv_open() in -llibiconv... no
    checking for iconv_open() in -llibiconv... no
    checking for iconv_open() in -lc... yes
    checking for xmlParseDoc() in -lxml2... no
    checking for xmlParseDoc() in -llibxml2... no
    checking for xmlParseDoc() in -lxml2... no
    *** extconf.rb failed ***
    Could not create Makefile due to some reason, probably lack of
    necessary libraries and/or headers. Check the mkmf.log file for more
    details. You may need configuration options.
    -
    -
    -----------------------

  7. method says:

    subbu, make sure you install the libxml headers. If you're on Debian, sudo apt-get install libxml-dev, or else compile and install libxml.

  8. janfri says:

    @subbu: sudo apt-get install libxml2-dev

  9. Will Green says:

    @subbu Like the error message says, you're probably missing some libraries. Looks like you're missing libxml2

  10. Peter Cooper says:

    Fixed the link. Thanks!

  11. zerohalo says:

    "ibxml-ruby now runs on Windows"

    is that relevant anymore? ;-)

    In all seriousness, this is good news. Thanks, Charlie.

  12. Peter Cooper says:

    Would it be possible to write some bindings between the new libxml-ruby and REXML so old REXML-based code can use it?

  13. Charlie says:

    P

  14. Charlie says:

    subbu - Report bugs at RubyForge:

    http://rubyforge.org/tracker/?atid=1971&group_id=494&func=browse

    Peter - REXML bindings are definitely possible. Anyone want to volunteer? If they pass the REXML test suite (is there one?), then I'd be happy to include them in the distribution.

  15. Sebastian says:

    Just installed libxml-ruby to get a feel for it, and one thing immediately stands out: there's no README file included!

    It may be nit-picky, but I've gotten so used to a friendly README file being included in a gem that I've kind of come to expect it. Libraries near and dear to my heart like Hpricot and Net::SSH both include these.

    Or, in lieu of a readme, I usually expect to be able to generate an RDoc for a library and have some good info right there in index.html.

    Just my 0.02c,
    S.

  16. Fred says:

    That is really good news.

    I have been working with libxml in the past with big xml files, 20mb to 100mb.
    To parse the 100mb xml file and save to database was taking like 30 minutes and huge amount of ram.

    Let's see how faster it is now...

  17. rick says:

    Fred: I hope you're using a streaming API for that...

  18. Charlie says:

    Sebastian - How did you install. The gem package most definitely includes a readme file and its the main page of the RDocs.

    Fred - Try out the xmlReader class for a streaming api.

  19. Sebastian says:

    @Charlie : I just did a regular "gem install libxml-ruby" - maybe I should have specified the version?

  20. Charlie says:

    Sebastian,

    What you did looks fine. What version got installed? And see if the gem directory as README file and a doc directory with RDocs (both should exist). If not, could you submit a bug? Thanks.

  21. Sebastian says:

    @Charlie: hm...this is odd. It says version 0.8.1 was installed?

    I think I found the issue, sorry to raise such a fuzz: I'm using the "gemdoc" shortcut that was posted on RubyInside a while back, and it looks like it's going into a different directory than where the "good" documentation lives. On my Mac, the correct directory that has the files you mentioned is in:

    file:///Library/Ruby/Gems/1.8/gems/libxml-ruby-0.8.1/doc/rdoc/index.html

    But "gemdoc" looks for rdocs in:

    file://localhost/Library/Ruby/Gems/1.8/doc/libxml-ruby-0.8.1/rdoc/index.html

    Sorry for not catching that the first time and raising such a stink!

  22. Sebastian says:

    s/on RubyInside/on RailsEnvy/

  23. Charlie says:

    Sebastian - Ah, didn't know about the gemdoc shortcut. Will have to look into it.

  24. Fred says:

    Rick: I did use streaming api. maybe not so optimized.

    I just discovered that i was wrong, the memory footprint and the long time to do the job was only due to libxml, but loading the whole XML file and split the elements into arrays, then use libxml to parse each element at a time and save to database...

    it is still a lot faster now even thou my algorithm is the slow culprit. shit...

    hehe

    Thanks for the tips!
    and this awesome work on libXML.
    it rocks!

  25. Fred says:

    i mistyped previous post.
    replace "was only due to libxml"
    with "was not really due to libxml"

    sorry

  26. austin_web_developer says:

    Maybe this is just a windows issue, but I had to use
    require 'xml/libxml' to get it to work

    http://www.concept47.com/austin_web_developer_blog/ruby/fixing-the-libxml-ruby-gem-error-uninitialized-constant-xml-nameerror/

Other Posts to Enjoy

Twitter Mentions