We are moving the Knetwit index to Solr to improve our document search. As we are doing this we are also migrating away from one monolithic storage device to the Hadoop Distributed File System (HDFS). In all the planning we began to ask ourselves what else we could leverage Hadoop and Solr for in our organization. For instance RackSpace uses Hadoop/Solr/Lucene to parse their log files: http://blog.racklabs.com/?p=66 .
I plan to document our move to Solr and Hadoop here. I will discuss setting up Hadoop and Solr on Amazon EC2, as well as creating the interfaces which we use to communicate with our Ruby on Rails application.