Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Ruby Linguistics: A framework for playing with English in Ruby

By Peter Cooper / July 10, 2006

This library is so amazingly cool that it requires no descriptions beyond these code examples:

"runs".en.present_participle
# => "running"

2004.en.numwords
# => "two thousand and four"

"cow".en.quantify( 20_432_123_000_000 )
# => "tens of trillions of cows"

"ruby".en.plural
# => "rubies"

Or what about?

allobjs = []
ObjectSpace::each_object {|obj| allobjs << obj.class.name}

puts "The current Ruby objectspace contains: " +
		allobjs.en.conjunction( :generalize => true )
The current Ruby objectspace contains: thousands of Strings,
thousands of Arrays, hundreds of Hashes, hundreds of
Classes, many Regexps, a number of Ranges, a number of
Modules, several Floats, several Procs, several MatchDatas,
several Objects, several IOS, several Files, a Binding, a
NoMemoryError, a SystemStackError, a fatal, a ThreadGroup,
and a Thread

Go learn more.

Comments

  1. RSL says:

    Thanks a friggin' million on this find. I'm working on an art project Rails app that this will/might help sooooo much.

  2. Tobin says:

    Wow, awesome. Thanks for the heads up.

  3. Steve Koppelman says:

    Yeah, but isn't 2004 "two thousand four" without the "and"? Sort of how the String helper to_sentence() puts a comma immediately before the "and" by default, when at least here in North America it shouldn't.

    Are these correct grammar in whatever country the lead developer of the module calls home, or does Ruby Linguistics need a quick once-over by a linguist? ;)

  4. Peter Cooper says:

    Steve: The "Oxford comma" (as I know it, but I think it is called a Harvard comma in the US) is a common point of discussion regarding the English language. Using a comma before 'and' in a list is not particularly a trait of any English dialect and is similarly common, in my experience, in both British and American English. In fact, it tends to be a difference between formal and informal use. The US government style guide recommends it, for example, along with most academic style guides on either side of the pond.

    Regarding the 'and' in numbers, the form with the 'and' is legitimate in both British and American English, whereas the form without the 'and' is dubious in British English (though not unseen). I think this library is, as you suggest, erring towards the creator's dialect, but appears to be taking the most 'formal' route, if not the most pragmatic one.

  5. Michael Granger says:

    Steve:

    The #numwords method includes a setting ':and' which lets you specify how you want that situation handled. Unfortunately, I noticed that it wasn't working in the latest release, but I've since fixed it. Look for a new release that'll let you have 'two thousand four' in the next few days.

    I'm not sure what 'helper' you're referring to, as there is no 'to_sentence()' in the Linguistics library. I assume you're referring to #conjunction, but the decision to make the default put a comma before the final clause of a list of three or more was based on Strunk and White's "Elements of Style", not on any particular deference to any particular local grammar conventions:

    In a series of three or more terms with a single conjunction, use a comma after each term except the last.

    Thus write:

    red, white, and blue
    honest, energetic, but headstrong
    He opened the letter, read it, and made a note of its contents.

    That said, if you'd rather have the final comma omitted, you can use the :penultimate option:

    irb(main):001:0> %w{duck cow rabbit dog}.conjunction
    ==> "a duck, a cow, a rabbit, and a dog"

    irb(main):002:0> %w{duck cow rabbit dog}.conjunction( :penultimate => false )
    ==> "a duck, a cow, a rabbit and a dog"

    As to your asking if the library needs a "once-over by a linguist", are you saying that it's inconceivable that the lead developer wasn't himself a linguist? While I do not have formal degree asserting such, linguistics and grammar are nonetheless very dear to me. I can't claim to have provided a perfect implementation of English for all dialects and locales, but I did make a concerted effort to make the English bits of Ruby-Linguistics correct for most uses while providing some flexibility for those who wanted alternate behavior.

    I would of course welcome suggestions, corrections, and any other comments linguists or anyone else might care to make. :)

    [Apologies if this isn't formatted correctly, as I can't tell what format comments support.]

  6. Peter Cooper says:

    Your comment seems to have come out perfectly.. :)

  7. stipes says:

    As far as using "and" in numbers, it is generally left out when using numbers in a mathematical sense. At least in American English, in mathematics, the "and" is used to signify the decimal place, separating the whole number from the fraction. In a plain whole number, the "and" is left out:

    2004 = two thousand four
    2004.5 = two thousand four and one half

    In general, that's getting pretty picky... The framework looks really cool, and I can definitely see it having lots of future applications.

Other Posts to Enjoy

Twitter Mentions