ColdFusion and Lucene
It seems that every couple of years someone has need for some aspect of an information retrieval (IR) system with features that ColdFusion's bundled Verity IR doesn't have (see cflucene and Aaron Johnson's blog). I too ran into a situation that called for investigating alternatives to Verity.
Our Special Collections Research Center has used 3x5 index cards to catalog their archives and manuscript collections. There are myriad problems with a hard-copy index catalog (that's why we use computers right?), so we started a process of scanning the cards and running them through OCR software (we're using ABBYY). My original thought was just to dump these all in a location and index them with Verity. All was going well until I realized we had a little more than 110,000 of these individual files which was pushing the 150,000 document limit for our version of ColdFusion. I also knew that there were some other projects to digitize back-issues of student newspapers that would push this document count higher.
We had a few choices to make, upgrade our current license to the Enterprise Edition which has a 250,000 document limit, purchase a Google Search Appliance, or find a relatively easy-to-implement IR engine. Each had its own pros and cons, and we initially leaned toward purchasing the Enterprise license. I also have a forthcoming project (hopefully) that will take traditional structured data and pair it with unstructured data reports. For that, I need something that is capable of indexing pretty much anything. However, as funding in academic libraries is challenging at times, I started poking around with Lucene.
I remembered that a few years ago, Macromedia had released some code called lindex in its DRK 3. I found the CD and tried it out. I noticed that it was built on the 1.2 release of Lucene and used an old version of PDFBox to extract text from PDFs. Since there have been some significant improvements to Lucene since version 1.2 (the most current version is 2.1), I thought I would try replacing the lucene-core jar file with the more recent one, but that just led to a heap of problems. I kept looking around, but most of the projects out there haven't kept their code up to date with Lucene, so I figured I'd start playing around with it.
I'll be doing a series of posts on indexing, implementing different parts of not only Lucene and some of its contributed modules, but some third party software to help create thematic categories in the search results (http://demo.carrot2.org/demo-stable/main) and index management using ColdFusion.
For the time being, here's a demo of the search engine I'm working on. It is an index of William and Mary's student newspaper, The Flat Hat, from 1939 to 1950.

Yes (I even linked to it above). It's actually really quite nice and I thought initially it would be what I used, but it hasn't been worked on in over a year and is based on the 1.4.3 branch. Lucene's up to 2.1, and without a lot of refactoring, the code in cflucene just won't work. I also wanted to do provide the ability to provide "more like this..." and "did you mean..." searches, while also being able to index PDFs, Word, PowerPoint, RTF, HTML, and XML files.
I don't like the CF_custome restriction of verity, and would like to implement lucent in our coldfusion platform, but resource is limited on the coldfusion+lucene
I am looking for Lindex and CFlucene, but they all appears to be quite old and not keep up to date
There is a link in Ray's blogg from a user called Simon that said he is using the latest lucene with coldfuion.