This adds mention of my latest tweak to the tsearch2/pg_trgm

author Tom Lane <tgl@sss.pgh.pa.us>

Sat, 27 Nov 2004 00:01:02 +0000 (00:01 +0000)

committer Tom Lane <tgl@sss.pgh.pa.us>

Sat, 27 Nov 2004 00:01:02 +0000 (00:01 +0000)
author Tom Lane <tgl@sss.pgh.pa.us>
Sat, 27 Nov 2004 00:01:02 +0000 (00:01 +0000)
committer Tom Lane <tgl@sss.pgh.pa.us>
Sat, 27 Nov 2004 00:01:02 +0000 (00:01 +0000)
diff --git a/contrib/pg_trgm/README.pg_trgm b/contrib/pg_trgm/README.pg_trgm

index ac2eb012de5136ee56fe4cd5d62df3c535fad94b..608c30c455c529adaed5e1f428d794092f43b1a4 100644 (file)
--- a/contrib/pg_trgm/README.pg_trgm
+++ b/contrib/pg_trgm/README.pg_trgm
@@ -100,11 +100,15 @@ Tsearch2 Integration
         The first step is to generate an auxiliary table containing all
         the unique words in the Tsearch2 index:
  
-       CREATE TABLE words AS 
-               SELECT word FROM stat('SELECT vector FROM documents');
-
-       Where 'documents' is the table that contains the Tsearch2 index
-       column 'vector', of type 'tsvector'.
+       CREATE TABLE words AS SELECT word FROM
+               stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
+
+       Where 'documents' is a table that has a text field 'bodytext'
+       that TSearch2 is used to search.  The use of the 'simple' dictionary
+       with the to_tsvector function, instead of just using the already
+       existing vector is to avoid creating a list of already stemmed
+       words.  This way, only the original, unstemmed words are added
+       to the word list.
  
         Next, create a trigram index on the word column:
author	Tom Lane <tgl@sss.pgh.pa.us>
	Sat, 27 Nov 2004 00:01:02 +0000 (00:01 +0000)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Sat, 27 Nov 2004 00:01:02 +0000 (00:01 +0000)