Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Classifier Gem: Bayesian and LSI Classification for Ruby

By Hendy Irawan / May 24, 2007

Classifier is a Ruby gem developed by Lucas Carlson and David Fayram II to allow Bayesian and other types of classifications, including Latent Semantic Indexing.

Bayes classifier is a probabilistic algorithm which apply Bayes’ theorem in order to learn the underlying probability distribution of the data. One popular use for this is implemented in most spam filtering packages.

It can also be applied to many other cases of machine learning to make your Ruby application more intelligent (the complicated implementation is transparently handled for you, thankfully!) Ilya Grigorik recently posted an interesting tutorial on Bayes classification, with an easy-to-follow demonstration on how to use it for distinguishing between funny vs. not funny quotes:

require 'rubygems'
require 'stemmer'
require 'classifier'

# Load previous classifications
funny     = YAML::load_file('funny.yml')
not_funny = YAML::load_file('not_funny.yml')

# Create our Bayes / LSI classifier
classifier = Classifier::Bayes.new('Funny', 'Not Funny')

# Train the classifier
not_funny.each { |boo| classifier.train_not_funny boo }
funny.each { |good_one| classifier.train_funny good_one }

# Let's classify some new quotes
puts classifier.classify "Peter: A boat's a boat but a box could be anything! It could even be a boat!"
puts classifier.classify "Stewie: Damn you ice cream, come to my mouth! How dare you disobey me!"
puts classifier.classify "Brian: I could take my sweater off too, but I think it's attached to my skin. "
puts classifier.classify "Peter: Hey, anybody got a quarter? Bill Gates: What's a quarter? "
puts classifier.classify "Peter: I had such a crush on her. Until I met you Lois. You're my silver medal. "
puts classifier.classify "Meg: Excuse me, Mayor West? Adam West: How do you know my language? "
puts classifier.classify "Meg: You could kill all the girls who are prettier than me. Death: Well, that would just leave England. "

Alternatives and other useful resources: bn4r (article), Bishop, Microsoft Belief Network

About Hendy Irawan

Ruby programmer from Indonesia. View all posts by Hendy Irawan →

Comments

  1. Helder says:

    bn4r (Bayesian Networks for Ruby) recently joined forces with sbn (Simple Bayesian Networks), as they are very similar projects. The result will be hosted at http://rubyforge.org/projects/sbn4r/ .

  2. Hendy Irawan says:

    Dear Helder Ribeiro,

    Thank you so much for your update.

    And good luck for your Firewatir-Gen Google Summer of Code challenge! :-) Keep rocking!

  3. Surendra Singhi says:

    There is also this clusterer gem http://rubyforge.org/projects/clusterer/, which has various types of Bayesian Classifier + clustering algorithms + LSI + many different stemming alternatives, though I admit its currently not very well documented. http://cuttingtheredtape.blogspot.com/2007/03/clusterer-other-plugins.html.

  4. Hendy Irawan says:

    Dear Surendra Singhi,

    Thank you for your information. It surely is very useful!

Other Posts to Enjoy

Twitter Mentions