Saturday, January 22, 2005

Levenshtein and other stuff

Another week on the contract, and a busy one at that. We had a big meeting with their larger customers and the concept of 'closely matching' text strings was raised.

The original consensus was that it wasn't 'real world possible'... well those of you that know me, know that's a challenge I couldn't resist!

Long story short I wrote a Visual FoxPro (as well as a VB.Net) implementation of the Levenshtein algorithm and modified the return value such that it returned what I've been terming the 'Percentage of Likeness' (POL). The POL allows the user to view that number and based on the level the company has set as an acceptable level of the POL number simply accept the string as 'close enough' and accept it!

Given I returned it to them the morning following the meeting, I got a lot of 'atta-boys' on Friday :) Made for a nice end to a very hectic week.

I'm going to miss this client when the project is finished (end of February at this point) as it's been very challenging, yet they leave me alone to actually produce what they've requested, not at all that common these days. Most projects remind me of the show 'American Hot Rod', where too much is never enough!! This one has high expectations, but delivers the time and resources to actually make it happen!

Next week I'll be sifting through about a quarter of a million customer records using the new algorithm to find items that are potential duplicate entries based on the POL of the address strings... should be very interesting stuff!

-Bill

No comments: