Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Ariel: A Ruby Information Extraction Library

By Peter Cooper / August 22, 2006

Alex Bradbury has developed Ariel, a library that uses predefined examples to work out how to extract information from other documents. It was a Google Summer of Code project and was mentioned by Austin Ziegler. More directly from Alex:

Ariel is a library that allows you to extract information from semi-structured documents (such as websites). It is different to existing tools because rather than expecting the developer to write rules to extract the desired information, Ariel will use a small number of labeled examples to generate and learn effective extraction rules. It is developed by Alex Bradbury and released under the MIT license. Ariel was started as a Google Summer of Code project mentored by Austin Ziegler in 2006.

You can learn more at http://ariel.rubyforge.org/ .. In general, first you define a structure for the information you wish to extract before selecting some example documents and marking them up in a way that Ariel can understand. Powerful!

Other Posts to Enjoy

Twitter Mentions