3 Tools for DRYing Your Ruby Code

By Mike Gunderloy / November 7, 2008

We've all heard the admonitions: "Don't Repeat Yourself!" But how do you avoid this if you're working on a Ruby codebase that stretches to thousands of lines, maintained by multiple developers? One answer is to run a tool that looks for duplicate code. This is an area where good tools are tantalizingly close - there are at least three out there that are worth checking out:

Towelie

The first contender is Giles Bowkett's Towelie, which uses parsetree and ruby2ruby to look through a set of files searching for duplicates. Unfortunately, Towelie in its current state was unable to handle my test case (the Active Record subtree of Rails), persistently erroring somewhere in parsetree. Admittedly, ActiveRecord is an extremely large and often arcane code base - though yours may be also.

Giles has written an extensive blog post on Towelie, which includes a screenshot of a successful output. Clean and to the point. Give it a try and see if it works on your code base - there's a lot of potential here.

Flay

Next I looked at Flay, which just showed up (instantly at version 1.0.0) on RubyForge. From Ryan Davis, Flay uses sexp_processor and ruby_parser to examine the structure of Ruby code. It's capable of detecting both exact and close matches, and did in fact find some spots in Active Record where patterns repeat. In its current state, Flay's output is very primitive: a list of repeated code nodes, together with a weight to rank them by and line numbers and file names where they show up.

Just gem install flay, and then flay *.rb to get playing with Flay.

Simian - A more general approach

Turning away from pure Ruby tools, I grabbed a copy of Simian, a code similarity analyzer that's been around for quite a while. Written in Java, Simian can handle Ruby source code just fine - and indeed, it very quickly found a number of duplicate lines in the source I was looking at. For open source projects, Simian is free; others will pay $99 or more to license. This is definitely a more mature and faster tool than either Towelie or Flay; the drawback is that it has no knowledge of Ruby code structures, and so can't do the sort of logical looking for duplicate intent that the native tools promise.

Conclusion

The verdict? If I were coming into a new codebase with suspicious provenance, I'd run all three tools against it to get a sense of how bad the situation is. But I'd love to see the Ruby community push along the two native tools to a point where they have better output and can actually be used in a nightly build to watch for problems. We're not there yet, but could be reasonably soon - thoughts?

Comments

Eric Davis says:
November 7, 2008 at 9:58 pm
These should make finding refactorings a lot easier. I'm going to have to take some time to use them against some Open Source projects and send in some patches.
Simon Harris says:
November 7, 2008 at 9:59 pm
Mike,

Simian has limited knowledge of Ruby--it ignores comments and noisy keywords such as def and end, etc. but yes, unfortunately at present it has no real knowledge of structures per-se. What kinds of features would you imagine it needing?

Cheers,

Simon
Giles Bowkett says:
November 7, 2008 at 10:21 pm
There's also a great project called reek. It's not a repetition detector but a code smell finder. Then again I might have first seen it on here.

The ParseTree errors - a lot of ParseTree's in C, iirc, so avoiding those errors is kind of a pain. I'm going to look at Flay's code and see if I can steal the sexp stuff.

The example stuff in the blog post you mention is models and controllers from Rails apps. I think I've got a decent hack to make it possible to throw Towelie at Rails views as well, but I haven't had time to find out for sure.
Mike Gunderloy says:
November 7, 2008 at 10:54 pm
Simon:

The seductive promise of Towelie and Flay is that the can find code that is duplicate in intent rather than just duplicate in syntax. For example, they should be able to spot two code sections that differ only in the name of an identifier used throughout the code, as long as the underlying structure is the same. That requires some knowledge of Ruby internals - as well as, I expect in the end, the ability to dial the similarity detector up or down.
Jens Himmelreich says:
November 10, 2008 at 10:58 am
The CopyPasteDetector of the PMD-Suite, is a mature java-tool.
It is able to 'understand' different languages: java, php, cpp and also ruby:

http://pmd.sourceforge.net/cpd.html
Tom Copeland says:
November 11, 2008 at 5:07 am
I've worked on CPD a bit and unfortunately it's got a weak Ruby tokenizer. I've tried to write a JavaCC-based tokenizer for Ruby but have not yet succeeded... someday, perhaps. For Java source, CPD does have the capabilities that Mike mentions - that is, it can ignore identifiers and literals.
Jean-Michel Garnier says:
November 11, 2008 at 9:56 am
I have written an html report generator which is using Simian and also integrates with Textmate and Netbeans

http://github.com/garnierjm/dry-report/wikis/home

I haven't tried Towelie and Flay but I guess I could also use their data to generate the reports