Building a Search Engine in 200ish Lines of Ruby

In Cool, Elsewhere, Tutorials

somesearchthing.jpegSau Sheong Chang works at Yahoo!'s Singapore office. Yahoo! isn't implemented in Ruby, of course, but Sau's made an attempt at implementing a basic search engine in Ruby and has written a pretty interesting, indepth article about the whole process. Sau's search engine is formed of a crawler, indexer, and query system, and uses Hpricot, DataMapper, and Sinatra to get things done. Lots of code, lots of explanations - go read it.

If you want to grab Sau's code for yourself, check out the saushengine repository on Github. You can also attempt to try a live version of the engine for yourself at http://saushengine.saush.net/ - it's down at the time of writing though and Sau warns its availability will be poor.

Support from: Brightbox; - Europe's leading provider of Ruby on Rails hosting. Now with Phusion Passenger support, each Brightbox server comes with access to a managed MySQL cluster and redundant SAN storage. Brightbox also provides managed services for large scale applications and dedicated clusters.

Vaguely Related Posts (Usually)

5 Comment Responses to “Building a Search Engine in 200ish Lines of Ruby”

  1. #1
    Guillaume Noireaux Says:

    What would be the location of these Yahoo! offices again? (Sorry, I've worked as a proof reader. Can't control myself now.)
    Thanks for the link and your great service to the spreading of knowledge in the community.

  2. #2
    Guillaume Noireaux Says:

    ...and yes, I have a funny way of writing proofreader.

  3. #3
    Peter Cooper Says:

    Good catch - thanks :)

  4. #4
    Hubert Łępicki Says:

    Nice. It's a nice article. I wish I had read this couple of months ago when I built my own crawler for one web application... it'd save me lots of time!

  5. #5
    feedbackmine Says:

    My tweetjobsearch ( http://github.com/feedbackmine/tweetjobsearch/tree/master ) is an open source twitter job search engine. It is a good example of building search engine in ruby. To make it more interesting, it uses libsvm to classify text.