Rebol Talk Forum  |  Getting Started  |  Code Examples, Tips & Advice  |  Topic: Levenshtein distance function
Pages: [1] Print
Author Topic: Levenshtein distance function  (Read 183 times)
fhein
Newbie
*
Offline Offline

Posts: 47


View Profile
Levenshtein distance function
« on: March 20, 2008, 02:50:36 PM »

Rebol function to calculate Levenshtein distance between two series (Ie. the number of operations required to turn one string into another, where operations consist of either inserting, deleting or replacing a character). This code is far from optimized and it's basically a Rebol translation of what's on Wikipedia.

Try it with:
levendist "phoenix" "fenix"

Code:
levendist: function [
"Calculate the Levenshtein distance between two series"
s [series!]
t [series!]
/with "Specify costs"
costdelete [number!] "The cost of deleting an item"
costinsert [number!] "The cost of inserting an item"
:costsubs [function!] "Function that calculates the cost of translating from one item to another"
][
d m n
][
m: 1 + length? s
n: 1 + length? t
d: array reduce [ m n ]

unless with [
costsubs: func [a b][1]
costdelete: costinsert: 1
]

repeat i m [
d/:i/1: i - 1
]
repeat j n [
d/1/:j: j - 1
]

for i 2 m 1 [
for j 2 n 1 [
d/:i/:j: first minimum-of reduce [
d/(i - 1)/:j + costdelete ; Deletion
d/:i/(j - 1) + costinsert ; Insertion
d/(i - 1)/(j - 1) + ; Substitution
either (pick s i - 1) = (pick t j - 1) [ 0 ][ costsubs (pick s i - 1) (pick t j - 1) ]
]
]
]

d/:m/:n
]

My own code checks so that costsubs takes exactly two arguments, but that was relying on another helper function I wrote so I removed that part Smiley
« Last Edit: March 20, 2008, 02:53:49 PM by fhein » Logged
fhein
Newbie
*
Offline Offline

Posts: 47


View Profile
Re: Levenshtein distance function
« Reply #1 on: March 20, 2008, 02:57:10 PM »

Here's a test function that can be used if you think replacing similar letters with eachother should be "cheaper" (ie. give less penalty). It's ofcourse much slower than using a static weight.

Try it with:
levendist/with "phoenix" "fenix" 1 1 ldcosttest

Code:
ldcosttest: function [
a [char!]
b [char!]
][
lgs
][
lgs: [ {aouå} {eiyäö} {,.¡!¿?'"} {qwrtpsdfghjklzxcvbnmñ} ]
forall lgs [
if all [
found? find first lgs a
found? find first lgs b
][
return 0.5
]
]
1
]
Logged
fhein
Newbie
*
Offline Offline

Posts: 47


View Profile
Re: Levenshtein distance function
« Reply #2 on: March 20, 2008, 02:59:58 PM »

And here's another cost function, that I'm actually using in an application. It's for a Spanish word exercise program, and I want to be able to write ? where there's supposed to be an upside down ? (since I can't easily type those Smiley)

Code:
my-cost-func: func [
a
b
][
any [
if equal? lowercase a lowercase b [0] ; Not case sensitive
if both-in? {¿?} a b [0] ; Don't enforce upside down ?
if both-in? {¡!} a b [0] ; Don't enforce upside down !
(1) ; Default cost
]
]
Logged
Sunanda
Jr. Member
**
Offline Offline

Posts: 90


View Profile
Re: Levenshtein distance function
« Reply #3 on: March 20, 2008, 07:08:32 PM »

If you want to compare several string similarity algorithms, check out Francois' excellent script in the REBOL Library:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=simetrics.r
Logged
btiffin
Jr. Member
**
Offline Offline

Posts: 51


View Profile
Re: Levenshtein distance function
« Reply #4 on: March 20, 2008, 10:08:24 PM »

fhein; thanks.  I think I have the same algorithm coded, but your's looks nicer.  Smiley

Sunanda; yeah simetrics is pretty cool.
Logged
Pages: [1] Print 
Rebol Talk Forum  |  Getting Started  |  Code Examples, Tips & Advice  |  Topic: Levenshtein distance function
Jump to:  

  
Quick Search...

Advanced search
  
Welcome, Guest. Please login or register.
Did you miss your activation email?
May 17, 2008, 05:47:15 PM
Username: Password: Session Length:
  

News: 01-09-08

Alpha version of REBOL 3 has been released!


  
2169 Posts in 562 Topics by 1224 Members
Latest Member: thyptoste

  Rebol Talk Forum | Powered by SMF 1.0.9.
© 2001-2005, Lewis Media. All Rights Reserved.

RT design by Defiant Pc