BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Any computer science geeks?

Subject: Any computer science geeks?
From: markw at mohawksoft.com (markw at mohawksoft.com)
Date: Fri, 19 May 2006 22:38:17 -0400 (EDT)
In-reply-to: <79A9C3B8-CFC1-4C96-ABCE-CAFE2B8E6CAD@offthehill.org>
References: <18233.24.91.171.78.1147957561.squirrel@mail.mohawksoft.com> <79A9C3B8-CFC1-4C96-ABCE-CAFE2B8E6CAD@offthehill.org>

> Here is what you want (Dynamic Programming Algorithm (DPA) for Edit-
> Distance):
>
> http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/
>
> We used this technique at my previous job for validating OCR
> accuracy. Its very fast.

That seems to do what Levensthein distence does, am I missing something?
>
> -Josh
>
> On May 18, 2006, at 9:06 AM, markw at mohawksoft.com wrote:
>
>> While I like to think I'm no slouch in this department, I'd like to
>> see if
>> anyone has a better idea:
>>
>> You have two strings and you want to quantify how similar they are.
>>
>> Levenshtein Distance is not a good choice for two reasons:
>>
>> (1) it doesn't really show how similar strings are -- really, for
>> instance:
>>
>> "Tom Lehrer - The Was The Year That Was" and "That Was The Year
>> That Was -
>> Tom Lehrer"
>>
>> Are, to you and me, almost identical, but difficult to detect,
>> Levenshtein
>> only shows how many steps are required to convert one to another.
>>
>> (2) Levenshtein Distance is an N^2 algorithm and, well, yuck.
>>
>> Anyone have any ideas? I currently using an n^2 algorithm that scans
>> through both strings and used run length matching, but if you know of
>> something better, I'd love to hear it.
>>
>> _______________________________________________
>> Discuss mailing list
>> Discuss at blu.org
>> http://olduvai.blu.org/mailman/listinfo/discuss
>

References:
- Any computer science geeks?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)
- Any computer science geeks?
  - From: josh at offthehill.org (Josh ChaitinPollak)

Prev by Date: Any computer science geeks?
Next by Date: [OT] Retirement, 401k, Subcontracting
Previous by thread: Any computer science geeks?
Next by thread: Volunteer coverage for Saturday's installfest
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org