Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Recursive Descent Parser for Ruby

By Peter Cooper / November 13, 2006

Sometimes strange things happen. I've been developing a small, basic recursive descent parser for Ruby called RDParse. Just before writing this post I decided to Google that name, and lo and behold the first result is a Ruby recursive descent parser called RDParse, created by Dennis Ranke, that I posted to Code Snippets for posterity several months ago. Since both of these libraries are unlikely to be used at once and that Dennis doesn't seem to be maintaining his version, I've decided to stick with RDParse as the name of mine for now.

You can download my RDParse as rdparse.rb.txt, just rename it at your end if you want to use it. To use it, you'd do something like this (no syntax coloring as the Syntax gem doesn't appear to cope with the complexity):

require 'rdparser'

parser = RDParser.new do |g|
  g.main                'line(s)'
  g.line                'expression separator(?) comment(?)'
  g.comment             '"#" rest_of_line'
  g.rest_of_line        /.+$/
  g.separator           /;/
  g.expression          'term operation expression | term'
  g.term                'number | variable | string | brkt_expression'
  g.brkt_expression     '"(" expression ")"'
  g.number              /d+(.d+)?/
  g.operation           /[+-*/]/
  g.variable            /[a-z][a-z0-9]*/
  g.string              %r(["'](.*?[^\]|.*?)["'])
end

content = %q{
  (34 - 3) * 42;   # Comment here..
  "a" + "bcd"
}

syntax_tree = parser.parse(:main, content)
puts RDParser.text_syntax_tree(syntax_tree)

Here, a grammar is defined within the RDParser.new block, although it can also be passed in as a parameter in a hash. This grammar (for a nonsense language that can only perform basic expressions) is used to fuel a parser that generates the following syntax tree from the code in the content variable:

line
  expression
    term
      brkt_expression
        "(" => (
        expression
          term
            number => 34
          operation => -
          expression
            term
              number => 3
        ")" => )
    operation => *
    expression
      term
        number => 42
  separator => ;
  comment
    "#" => #
    rest_of_line => Comment here..
line
  expression
    term
      string => "a"
    operation => +
    expression
      term
        string => "bcd"

It's simple but effective. My initial version had callbacks and some other features that I discovered I didn't really need (and which I'd poorly implemented anyway). This version just does the basics, lexing and enough parsing to get a tree.

I'm not releasing it as a Gem or on RubyForge yet as it has a long way to go, but for anyone interested in this stuff, see what grammars you can knock up! Next step.. error reporting.

Comments

  1. jay says:

    Could RDParse also benefit from the so-called packrat parsing algorithm in some way?
    http://meta-meta.blogspot.com/2006/04/packrat-parsers.html

  2. Pingback: Treetop - Powerful But Easy Ruby Parser Library

Other Posts to Enjoy

Twitter Mentions