General Programming | Compare text

Author	Message	Time
Spht	Haven't even thought of how i'm going to do it, but i'd like to compare two strings (no larger than 256 bytes) and return a number between 0 and 100 for how much they match. ie, given "test" and "candy" would return 0. "test" and "a test" would probably return about 95%. "bob" and "bill" is maybe 5%. "Listen carefully: Do not touch" and "Do not touch" is maybe 60% It's in C++, and needs to be able to perform well with 100,000 simultaneous calls Shouldn't be too difficult to do, haven't put any thought into the best way to do it yet, so figured i'd just post here and see what you smart people come up with. Count how many of each letter appear and compare the amounts to the other string maybe?	July 13, 2007, 3:17 PM
rabbit	I'm not exactly sure if it'd be exactly what you want, but personally I'd tally the frequencies of each character and then comparing the results of that.	July 13, 2007, 3:58 PM
iago	Perhaps rabbit's solution, with some kind of weighting for how far apart the letters are? So when letters are in nearly the same position (at the beginning, end, middle, etc.), the words are considered more similar. Also, you might want to ignore vowels (or count them for less), since they can often be interchanged without changing the meaning as much as most consonants. It probably wouldn't help, but there is a "sounds like" operator in MySQL, which returns true or false depending on if the two entries sound like each other. I believe that there's a formula for how this is done on MySQL's Web site.	July 13, 2007, 4:48 PM

Author

Message

Time

Spht

Haven't even thought of how i'm going to do it, but i'd like to compare two strings (no larger than 256 bytes) and return a number between 0 and 100 for how much they match.

ie, given "test" and "candy" would return 0. "test" and "a test" would probably return about 95%. "bob" and "bill" is maybe 5%. "Listen carefully: Do not touch" and "Do not touch" is maybe 60%

It's in C++, and needs to be able to perform well with 100,000 simultaneous calls

Shouldn't be too difficult to do, haven't put any thought into the best way to do it yet, so figured i'd just post here and see what you smart people come up with. Count how many of each letter appear and compare the amounts to the other string maybe?

July 13, 2007, 3:17 PM

rabbit

I'm not exactly sure if it'd be exactly what you want, but personally I'd tally the frequencies of each character and then comparing the results of that.

July 13, 2007, 3:58 PM

iago

Perhaps rabbit's solution, with some kind of weighting for how far apart the letters are? So when letters are in nearly the same position (at the beginning, end, middle, etc.), the words are considered more similar. Also, you might want to ignore vowels (or count them for less), since they can often be interchanged without changing the meaning as much as most consonants.

It probably wouldn't help, but there is a "sounds like" operator in MySQL, which returns true or false depending on if the two entries sound like each other. I believe that there's a formula for how this is done on MySQL's Web site.

July 13, 2007, 4:48 PM

Valhalla Legends Forums Archive | General Programming | Compare text