summaryrefslogtreecommitdiff
path: root/contrib/fuzzystrmatch/fuzzystrmatch.h
AgeCommit message (Collapse)Author
2008-04-03Add a variant of the Levenshtein string-distance function that lets the userTom Lane
specify the cost values to use, instead of always using 1's. Volkan Yazici In passing, remove fuzzystrmatch.h, which contained a bunch of stuff that had no business being in a .h file; fold it into its only user, fuzzystrmatch.c.
2008-03-25Simplify and standardize conversions between TEXT datums and ordinary CTom Lane
strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-01-05Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian
back-stamped for this.
2006-07-10Remove a few baby-C macros in fuzzystrmatch. Add a few missing includes.Bruce Momjian
2006-03-11Add CVS tag lines to files that were lacking them.Bruce Momjian
2006-03-05Update copyright for 2006. Update scripts.Bruce Momjian
2005-01-26The attached patch implements the soundex difference function whichNeil Conway
compares two strings' soundex values for similarity, from Kris Jurka. Also mark the text_soundex() function as STRICT, to avoid crashing on NULL input.
2005-01-01Update copyrights that were missed.Bruce Momjian
2004-08-29Update copyright to 2004.Bruce Momjian
2003-08-04Fix some copyright notices that weren't updated. Improve copyright toolTom Lane
so it won't miss 'em again.
2003-06-24Jim C. Nasby wrote:Bruce Momjian
> Second argument to metaphone is suposed to set the limit on the > number of characters to return, but it breaks on some phrases: > > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from > (select 'Hello world'::varchar AS a) a; > HLW | HLWR | HLWRLT > > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from > (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; > AKM | AKMKS | AKMKSMMRL > > In every case I've found that does this, the 4th and 5th letters are > always 'KS'. Nice catch. There was a bug in the original metaphone algorithm from CPAN. Patch attached (while I was at it I updated my email address, changed the copyright to PGDG, and removed an unnecessary palloc). Here's how it looks now: regression=# select metaphone(a,4) from (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; metaphone ----------- AKMK (1 row) regression=# select metaphone(a,5) from (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; metaphone ----------- AKMKS (1 row) Joe Conway
2002-09-05Be careful to include postgres.h *before* any system headers, to ensureTom Lane
that the right flavors of largefile-related definitions are seen. Most of these changes are probably unnecessary, but better safe than sorry.
2001-12-30Make sure that all <ctype.h> routines are called with unsigned charTom Lane
values; it's not portable to call them with signed chars. I recall doing this for the last release, but a few more uncasted calls have snuck in.
2001-11-05New pgindent run with fixes suggested by Tom. Patch manually reviewed,Bruce Momjian
initdb/regression tests pass.
2001-10-25pgindent run on all C files. Java run to follow. initdb/regressionBruce Momjian
tests pass.
2001-08-07Sorry - I should have gotten to this sooner. Here's a patch which you shouldBruce Momjian
be able to apply against what you just committed. It rolls soundex into fuzzystrmatch. Remove soundex/metaphone and merge into fuzzystrmatch. Joe Conway
2001-08-07Per this discussion, here's a patch to implement both levenshtein() andBruce Momjian
metaphone() in a contrib. There seem to be a fair number of different approaches to both of these algorithms. I used the simplest case for levenshtein which has a cost of 1 for any character insertion, deletion, or substitution. For metaphone, I adapted the same code from CPAN that the PHP folks did. A couple of questions: 1. Does it make sense to fold the soundex contrib together with this one? 2. I was debating trying to add multibyte support to levenshtein (it would make no sense at all for metaphone), but a quick search through the contrib directory found no hits on the word MULTIBYTE. Should worry about adding multibyte support to levenshtein()? Joe Conway