JVM Settings

I ran across this great compilation of JVM settings for the different versions of Java. Very useful for Java tuning!

ColdFusion and Lucene

It seems that every couple of years someone has need for some aspect of an information retrieval (IR) system with features that ColdFusion's bundled Verity IR doesn't have (see cflucene and Aaron Johnson's blog). I too ran into a situation that called for investigating alternatives to Verity.

Our Special Collections Research Center has used 3x5 index cards to catalog their archives and manuscript collections. There are myriad problems with a hard-copy index catalog (that's why we use computers right?), so we started a process of scanning the cards and running them through OCR software (we're using ABBYY). My original thought was just to dump these all in a location and index them with Verity. All was going well until I realized we had a little more than 110,000 of these individual files which was pushing the 150,000 document limit for our version of ColdFusion. I also knew that there were some other projects to digitize back-issues of student newspapers that would push this document count higher.

We had a few choices to make, upgrade our current license to the Enterprise Edition which has a 250,000 document limit, purchase a Google Search Appliance, or find a relatively easy-to-implement IR engine. Each had its own pros and cons, and we initially leaned toward purchasing the Enterprise license. I also have a forthcoming project (hopefully) that will take traditional structured data and pair it with unstructured data reports. For that, I need something that is capable of indexing pretty much anything. However, as funding in academic libraries is challenging at times, I started poking around with Lucene.

I remembered that a few years ago, Macromedia had released some code called lindex in its DRK 3. I found the CD and tried it out. I noticed that it was built on the 1.2 release of Lucene and used an old version of PDFBox to extract text from PDFs. Since there have been some significant improvements to Lucene since version 1.2 (the most current version is 2.1), I thought I would try replacing the lucene-core jar file with the more recent one, but that just led to a heap of problems. I kept looking around, but most of the projects out there haven't kept their code up to date with Lucene, so I figured I'd start playing around with it.

I'll be doing a series of posts on indexing, implementing different parts of not only Lucene and some of its contributed modules, but some third party software to help create thematic categories in the search results (http://demo.carrot2.org/demo-stable/main) and index management using ColdFusion.

For the time being, here's a demo of the search engine I'm working on. It is an index of William and Mary's student newspaper, The Flat Hat, from 1939 to 1950.

MS Access via JDBC

We recently made the move from an IIS Windows web server to an Apache *nix based web server as part of our efforts to consolidate our library's server infrastructure. And for reasons I won't expound upon, we had one MS Access DSN that didn't get migrated to MSSQL and that needed to be used still. Since ColdFusion uses a Windows only driver for MS Access, I needed to figure out a way around this. I found a couple of JDBC drivers for Access (Easysoft's JDBC-ODBC Bridge and HXTT's Access Pure Java JDBC Drivers), but these seemed to be a bit on the expensive side for the short amount of time that I'd need to keep Access in production.

I did notice on Easysoft's website that they were using the JdbcOdbc bridge, so after a little bit more digging, I found the syntax to use configure ColdFusion to use MS Access through the JdbcOdbc Bridge; the JDBC URL is

jdbc:odbc:Driver={Microsoft Access Driver (*.mdb)};DBQ=/path/to/datasource.mdb;DriverID22;

and the Driver Class

sun.jdbc.odbc.JdbcOdbcDriver

For the very basic inserting of data from a seldom-used web form into a single table, this band aid fix has been doing pretty good!

Tweaking Eclipse

I finally got fed up yesterday with the slow speed at which Eclipse was launching on my Windows box. On my Linux box, it doesn't take that long to launch (may five or six seconds) compared to my Windows box (around 15 - 20 seconds). I know I have a lot of plugins, but it was getting a little ridiculous. I started poking around and I noticed in the configuration details (Help / About Eclipse SDK / Configuration Details) that the vm that was launching was 1.4. Ok, so there's one problem. I also noticed that the max memory setting was 256MB (-Xmx256M). Since this box has 2GB of RAM, I figured 256MB is a little on the low side (and note, mucking around with the heap sizes won't help load times).

The first thing I did was change the shorcut target for Eclipse to be

C:\eclipse\eclipse.exe -vm "C:\Program Files\Java\jdk1.5.0_08\bin\javaw.exe"

Now I'm sure that Eclipse will launch with Java 5 and (hopefully) speed things up a bit. Double-click on the short-cut, and sure enough, we're down to about 8 seconds.

The next thing I wanted to do is increase the default min and max memory settings. In c:\eclipse\eclipse.ini, you'll see

-vmargs
-Xms40M
-Xmx256M

I changed these to

-vmargs
-Xms256m
-Xmx768m

This increases the minimum memory space to 256M and the maximum to 768M. You can also do this by adding these vmargs to the shortcut target:

C:\eclipse\eclipse.exe -vm "C:\Program Files\Java\jdk1.5.0_08\bin\javaw.exe" -vmargs -Xms256M -Xmx768M

If you have a multi-processor computer (and I believe this includes dual-core systems, though I haven't read the docs on this), you can use some of the new VM ergonomics to self-tune garbage collection by adding this switch:

-XX:+UseParallelGC

One last handy parameter is "-showlocation" which shows the current location of your workspace. If you have different workspaces, this is handy.

If you want to read more about some JVM garbage collection (and who doesn't) here are some helpful links:

Java Black Belt

I caught this new site late last week and got around today to checking it out. JavaBlackBelt is a pretty cool way of doing skills assessment and learning. Using the whole Web 2.0 idea of communities, the site is a community for Java and open source skills assessment. The way it work is that there are a series of exams available for different "belts." Every time you pass an exam (you have to have a minimum of 80% passing), you get points based on the strength of the core exam, and every belt has a required exam. For example, for a yellow belt, you must pass an OO and basic JRE exam. This gives you nine points, and I think this is where JavaBlackBelt starts to shine, you must then start earning "points" to take other exams. You earn half a point for commenting on a question (this is good/sucks/etc.) and two points for actually writing a question. More difficult exams cost more points (between 4 and 60), so you have to contribute more questions.

If you don't know much about a particular subject (Spring and Hibernate are the two most costly exams), you can go in and start reading the beta questions and start thinking of your own. There's no better way to really learn a language than to get in and start trying to write questions that not only show a breadth, but a depth of knowledge in a given area.

If you fail an exam, they require you to wait two weeks before taking it again, but they give you immediate feedback on what questions you got wrong, and why that answer was incorrect.

This could be a great model for other languages...cfBlackBelt? flexBlackBelt? asBlackBelt? Or for that matter, it pretty much works well for any type of software package!

If you know a little Java, I encourage you to check it out.

Java Webstart

As I was finishing up my computer science project (a solitaire game framework), I decided I'd give a look at implementing it via the web. What I wasn't prepared for was how easy it is to deliver actual Java via the web. Essentially, you need a JAR file and an XML file (JNLP or Java Network Launching Protocol).

Since the professor didn't want us using Java packages, I created a new package directory in eClipse and copied all my files (including the card images) into the package structure. I then packaged up the thing and put it in a web accessible directory (here).

The JNLP file was what I thought would take for freakin' ever, but it took about as long to edit as it did to jar the source. Here's my JNLP:

<?xml version="1.0" encoding="utf-8" ?>
<!-- JNLP File for SwingSet2 Demo Application --->

<jnlp spec="1.0+" codebase="http://support.swem.wm.edu/wayne"
href="bristol.jnlp">
   <information>
      <title>Bristol Game</title>
      <vendor>Wayne Graham</vendor>
      <homepage href="index.cfm" />
      <description>Bristol Game</description>
      <description kind="short">Bristol solitaire game in
      Java</description>
      <icon href="bristol.gif" />
      <icon kind="splash" href="bristol.gif" />
      <offline-allowed />
   </information>
   <security>
      <all-permissions />
   </security>
   <resources>
      <j2se version="1.4+" />
      <jar href="Bristol.jar" />
   </resources>
   <application-desc main-class="edu.wm.swem.Bristol" />
</jnlp>

That's it! I just put this in the same directory and then clicked on the link (here and Java WebStart did the rest. Very cool!