How To Build A 5,000,000 Image Search Engine With Ruby, Alexa and S3
Note: This story actually dates back from May 2006, but I don't remember hearing about it then, and it's tucked down inside Alexa's Developer's Corner.
Using the Alexa Web Search Platform as a source of data, Derrick Pallas put together a search engine using Ruby, RMagick and Amazon S3. With AWSP's data set and CPU cluster he fetched 5,000,000 photos from the Web, analyzed their EXIF info with RMagick , and uploaded them to Amazon S3. The result was Camera Image Search, a search engine that can show you pictures taken with certain cameras or with certain exposure times / focal lengths / etc (for some reason the Manufacturer drop down does not appear to work, but other fields do).
This is an impressive demonstration of using the CPU power and mammoth data sets provided by Alexa and Amazon to put together something that would otherwise be out of the reach of the independent developer, and getting the full instructions to how he did it is great too.