|
Pages: [1]
|
 |
|
Author
|
Topic: Levenshtein distance function (Read 184 times)
|
|
fhein
|
Rebol function to calculate Levenshtein distance between two series (Ie. the number of operations required to turn one string into another, where operations consist of either inserting, deleting or replacing a character). This code is far from optimized and it's basically a Rebol translation of what's on Wikipedia. Try it with: levendist "phoenix" "fenix" levendist: function [ "Calculate the Levenshtein distance between two series" s [series!] t [series!] /with "Specify costs" costdelete [number!] "The cost of deleting an item" costinsert [number!] "The cost of inserting an item" :costsubs [function!] "Function that calculates the cost of translating from one item to another" ][ d m n ][ m: 1 + length? s n: 1 + length? t d: array reduce [ m n ]
unless with [ costsubs: func [a b][1] costdelete: costinsert: 1 ] repeat i m [ d/:i/1: i - 1 ] repeat j n [ d/1/:j: j - 1 ] for i 2 m 1 [ for j 2 n 1 [ d/:i/:j: first minimum-of reduce [ d/(i - 1)/:j + costdelete ; Deletion d/:i/(j - 1) + costinsert ; Insertion d/(i - 1)/(j - 1) + ; Substitution either (pick s i - 1) = (pick t j - 1) [ 0 ][ costsubs (pick s i - 1) (pick t j - 1) ] ] ] ] d/:m/:n ]
My own code checks so that costsubs takes exactly two arguments, but that was relying on another helper function I wrote so I removed that part 
|
|
|
|
« Last Edit: March 20, 2008, 02:53:49 PM by fhein »
|
Logged
|
|
|
|
|
fhein
|
Here's a test function that can be used if you think replacing similar letters with eachother should be "cheaper" (ie. give less penalty). It's ofcourse much slower than using a static weight. Try it with: levendist/with "phoenix" "fenix" 1 1 ldcosttest ldcosttest: function [ a [char!] b [char!] ][ lgs ][ lgs: [ {aouå} {eiyäö} {,.¡!¿?'"} {qwrtpsdfghjklzxcvbnmñ} ] forall lgs [ if all [ found? find first lgs a found? find first lgs b ][ return 0.5 ] ] 1 ]
|
|
|
|
|
Logged
|
|
|
|
|
fhein
|
And here's another cost function, that I'm actually using in an application. It's for a Spanish word exercise program, and I want to be able to write ? where there's supposed to be an upside down ? (since I can't easily type those  ) my-cost-func: func [ a b ][ any [ if equal? lowercase a lowercase b [0] ; Not case sensitive if both-in? {¿?} a b [0] ; Don't enforce upside down ? if both-in? {¡!} a b [0] ; Don't enforce upside down ! (1) ; Default cost ] ]
|
|
|
|
|
Logged
|
|
|
|
|
|
|
btiffin
|
fhein; thanks. I think I have the same algorithm coded, but your's looks nicer.  Sunanda; yeah simetrics is pretty cool.
|
|
|
|
|
Logged
|
|
|
|
|
|
Pages: [1]
|
|
|
 |
News: 01-09-08 Alpha version of REBOL 3 has been released!
2169 Posts in 562 Topics by 1224 Members
Latest Member: thyptoste
|