3 Tools for DRYing Your Ruby Code
We've all heard the admonitions: "Don't Repeat Yourself!" But how do you avoid this if you're working on a Ruby codebase that stretches to thousands of lines, maintained by multiple developers? One answer is to run a tool that looks for duplicate code. This is an area where good tools are tantalizingly close - there are at least three out there that are worth checking out:
The first contender is Giles Bowkett's Towelie, which uses parsetree and ruby2ruby to look through a set of files searching for duplicates. Unfortunately, Towelie in its current state was unable to handle my test case (the Active Record subtree of Rails), persistently erroring somewhere in parsetree. Admittedly, ActiveRecord is an extremely large and often arcane code base - though yours may be also.
Giles has written an extensive blog post on Towelie, which includes a screenshot of a successful output. Clean and to the point. Give it a try and see if it works on your code base - there's a lot of potential here.
Next I looked at Flay, which just showed up (instantly at version 1.0.0) on RubyForge. From Ryan Davis, Flay uses sexp_processor and ruby_parser to examine the structure of Ruby code. It's capable of detecting both exact and close matches, and did in fact find some spots in Active Record where patterns repeat. In its current state, Flay's output is very primitive: a list of repeated code nodes, together with a weight to rank them by and line numbers and file names where they show up.
gem install flay, and then
flay *.rb to get playing with Flay.
Simian - A more general approach
Turning away from pure Ruby tools, I grabbed a copy of Simian, a code similarity analyzer that's been around for quite a while. Written in Java, Simian can handle Ruby source code just fine - and indeed, it very quickly found a number of duplicate lines in the source I was looking at. For open source projects, Simian is free; others will pay $99 or more to license. This is definitely a more mature and faster tool than either Towelie or Flay; the drawback is that it has no knowledge of Ruby code structures, and so can't do the sort of logical looking for duplicate intent that the native tools promise.
The verdict? If I were coming into a new codebase with suspicious provenance, I'd run all three tools against it to get a sense of how bad the situation is. But I'd love to see the Ruby community push along the two native tools to a point where they have better output and can actually be used in a nightly build to watch for problems. We're not there yet, but could be reasonably soon - thoughts?