Using Ripper to See How Ruby Is Parsing Your Code

By Peter Cooper / August 5, 2011

In the past couple of months I've seen situations arise where developers aren't entirely sure how Ruby has chosen to interpret their code. Luckily, Ruby 1.9 comes with a built-in library called Ripper that can help solve the problem (there's a 1.8 version too, see later). Here, I give the 30 second rundown on what to do.

A Mystery To Solve

I've seen this confusion appear twice in the last month (the second time was what inspired me to write this post):

> puts {}.class

 => NilClass

Despite thinking that we should be seeing Hash appear, we don't. We get a blank line and NilClass in response. It doesn't get any better if you start prodding:

> puts { :x => 10 }.class
SyntaxError: (irb):19: syntax error, unexpected tASSOC, expecting '}'

What!?

Thankfully, you don't need to wonder any more. Even if you can have a good guess at what's going on, it's possible to lean on Ripper to figure things out.

What is Ripper?

Ripper is a library introduced in MRI Ruby 1.9 that hooks directly into Ruby 1.9's parser and which can provide you with abstract syntax trees or simple lexical analysis of the code that you provide. This can be useful to work out why Ruby is interpreting a given piece of code in a certain way, such as our troublemaker above.

The following tip isn't going to be a full breakdown of what Ripper does, of course, but there are already posts out there to help with that. Check out Using Ruby 1.9 Ripper by Sven Fuchs for starters.

How to Solve Our Mystery

So what is puts {}.class up to?

Getting Ripper to give us its interpretation of events (and therefore the interpretation of what Ruby 1.9 is seeing when it parses the code) is easy:

p Ripper.sexp("puts {}.class")

[:program, [[:call, [:method_add_block, [:method_add_arg, [:fcall, [:@ident, "puts", [1, 0]]], []], [:brace_block, nil, [[:void_stmt]]]], :".", [:@ident, "class", [1, 8]]]]]

Not exactly the easiest thing in the world to read, so I suggest you gem install awesome_print (see our writeup on why it's so awesome) and then try:

require 'ap'
ap Ripper.sexp("puts {}.class")

[
    [0] :program,
    [1] [
        [0] [
            [0] :call,
            [1] [
                [0] :method_add_block,
                [1] [
                    [0] :method_add_arg,
                    [1] [
                        [0] :fcall,
                        [1] [
                            [0] :@ident,
                            [1] "puts",
                            [2] [
                                [0] 1,
                                [1] 0
                            ]
                        ]
                    ],
                    [2] []
                ],
                [2] [
                    [0] :brace_block,
                    [1] nil,
                    [2] [
                        [0] [
                            [0] :void_stmt
                        ]
                    ]
                ]
            ],
            [2] :".",
            [3] [
                [0] :@ident,
                [1] "class",
                [2] [
                    [0] 1,
                    [1] 8
                ]
            ]
        ]
    ]
]

There are very few people who will be ready to confidently interpret this output out of the box, but if we pick through it we can get some significant clues as to what's going on. It is important to keep track of the nesting.

Note that the first nested section starts with :call and then has some nested sections starting with :method_add_block and :brace_block. A function call (fcall) to puts is then associated with that block. Later on, . (to call a method/send a message) is invoked, and then the "class" message passed upon that block. So what's happening?

It turns out that Ruby is interpreting the {} as a code block being used upon puts! It's not a hash, but a code block! If you rewrite the code a little to remove the ambiguity and try again, what happens?

ap Ripper.sexp("puts({}.class)")
[
    [0] :program,
    [1] [
        [0] [
            [0] :method_add_arg,
            [1] [
                [0] :fcall,
                [1] [
                    [0] :@ident,
                    [1] "puts",
                    [2] [
                        [0] 1,
                        [1] 0
                    ]
                ]
            ],
            [2] [
                [0] :arg_paren,
                [1] [
                    [0] :args_add_block,
                    [1] [
                        [0] [
                            [0] :call,
                            [1] [
                                [0] :hash,
                                [1] nil
                            ],
                            [2] :".",
                            [3] [
                                [0] :@ident,
                                [1] "class",
                                [2] [
                                    [0] 1,
                                    [1] 8
                                ]
                            ]
                        ]
                    ],
                    [2] false
                ]
            ]
        ]
    ]
]

It should be immediately apparent once you walk through this syntax tree array that things are now good. In the second nested section you should be able to make out a :call section whose first argument is a :hash, then the ., and then the class message. This means we have a hash upon which the class method is being called - the behavior we wanted :-)

Needless to say, it can get more complicated than that, and syntax trees can get significantly deeper too, but if you get stuck on an obscure parsing error or simply want to dig into what Ruby is up to when it's parsing your code, give Ripper a quick spin.

But I'm on Ruby 1.8!

Despite being a Ruby 1.9ism, Ripper has been ported to Ruby 1.8 by Loren Segal. He has also written a blog post about it. I've not tried it (I'm a fully paid up 1.9 guy nowadays) but it's worth a try if you're still using 1.8.

Bonus Info

Sorcerer is a library by Jim Weirich that can turn the s-expressions above back into Ruby code. This could be useful for changing Ruby code in a structured way. I only had limited success with it though on some simple examples. Still, worth looking into.

Comments

Bradly Feeley says:
August 5, 2011 at 11:04 pm
Very cool. It would be great to run the output through graphviz.
Peter Cooper says:
August 5, 2011 at 11:10 pm
Ooh, that sounds like an interesting idea. That would be pretty easy to conjure up too.. If no-one tries this in the next week, I'll be giving it a go ;-)
Giles says:
August 7, 2011 at 3:58 am
There's actually some missing history here. For 1.8, Ryan Davis had/has a small suite of similar gadgets called ParseTree, ruby_parser, and Ruby2Ruby. They had annoying indentation and syntax quirks but were basically amazing gems. A few years ago I built a method duplication detector with them. Somebody who forked it, though, ported it to run on a different Ruby AST extractor instead - can't remember the name, unfortunately.

Anyway, hoping to fiddle with Ripper soonish.
Rudkovsky says:
August 7, 2011 at 7:03 am
Really helpful thing, thank you Peter!
Mike says:
August 7, 2011 at 1:38 pm
Thanks for the write up. I hadn't heard of Ripper before.

Is puts {}.class all that confusing though? It seems pretty obvious that the {} in this context means it is a block.
Peter Cooper says:
August 7, 2011 at 8:15 pm
@Mike: It confused two people I consider to be smart, so I'm erring on the side of the caution here :-)

@Giles: Oh totally. Posted about ParseTree before a couple of times but I didn't go into any history here. I sorta dropped those technologies off of my radar as soon as I saw some notes about them not being supported on 1.9.