diff options
| author | Teodor Sigaev | 2016-03-04 17:08:10 +0000 |
|---|---|---|
| committer | Teodor Sigaev | 2016-03-04 17:08:47 +0000 |
| commit | d78a7d9c7fa3e9cd494b906f065fe7b7fe9fb9a5 (patch) | |
| tree | 23389711b4ccf0f5c8dd7684ae102b9eac5df66c /doc/src | |
| parent | 9445db925e78c2c4fb12067ad5618e2aecabe109 (diff) | |
Improve support of Hunspell in ispell dictionary.
Now it's possible to load recent version of Hunspell for several languages.
To handle these dictionaries Hunspell patch adds support for:
* FLAG long - sets the double extended ASCII character flag type
* FLAG num - sets the decimal number flag type (from 1 to 65535)
* AF parameter - alias for flag's set
Also it moves test dictionaries into separate directory.
Author: Artur Zakirov with editorization by me
Diffstat (limited to 'doc/src')
| -rw-r--r-- | doc/src/sgml/textsearch.sgml | 148 |
1 files changed, 140 insertions, 8 deletions
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index d66b4d5d5f..ff99976068 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -2615,18 +2615,41 @@ SELECT plainto_tsquery('supernova star'); </para> <para> - To create an <application>Ispell</> dictionary, use the built-in - <literal>ispell</literal> template and specify several parameters: + To create an <application>Ispell</> dictionary perform these steps: </para> - + <itemizedlist spacing="compact" mark="bullet"> + <listitem> + <para> + download dictionary configuration files. <productname>OpenOffice</> + extension files have the <filename>.oxt</> extension. It is necessary + to extract <filename>.aff</> and <filename>.dic</> files, change + extensions to <filename>.affix</> and <filename>.dict</>. For some + dictionary files it is also needed to convert characters to the UTF-8 + encoding with commands (for example, for norwegian language dictionary): <programlisting> -CREATE TEXT SEARCH DICTIONARY english_ispell ( +iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff +iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic +</programlisting> + </para> + </listitem> + <listitem> + <para> + copy files to the <filename>$SHAREDIR/tsearch_data</> directory + </para> + </listitem> + <listitem> + <para> + load files into PostgreSQL with the following command: +<programlisting> +CREATE TEXT SEARCH DICTIONARY english_hunspell ( TEMPLATE = ispell, - DictFile = english, - AffFile = english, - StopWords = english -); + DictFile = en_us, + AffFile = en_us, + Stopwords = english); </programlisting> + </para> + </listitem> + </itemizedlist> <para> Here, <literal>DictFile</>, <literal>AffFile</>, and <literal>StopWords</> @@ -2643,6 +2666,56 @@ CREATE TEXT SEARCH DICTIONARY english_ispell ( </para> <para> + The <filename>.affix</> file of <application>Ispell</> has the following + structure: +<programlisting> +prefixes +flag *A: + . > RE # As in enter > reenter +suffixes +flag T: + E > ST # As in late > latest + [^AEIOU]Y > -Y,IEST # As in dirty > dirtiest + [AEIOU]Y > EST # As in gray > grayest + [^EY] > EST # As in small > smallest +</programlisting> + </para> + <para> + And the <filename>.dict</> file has the following structure: +<programlisting> +lapse/ADGRS +lard/DGRS +large/PRTY +lark/MRS +</programlisting> + </para> + + <para> + Format of the <filename>.dict</> file is: +<programlisting> +basic_form/affix_class_name +</programlisting> + </para> + + <para> + In the <filename>.affix</> file every affix flag is described in the + following format: +<programlisting> +condition > [-stripping_letters,] adding_affix +</programlisting> + </para> + + <para> + Here, condition has a format similar to the format of regular expressions. + It can use groupings <literal>[...]</> and <literal>[^...]</>. + For example, <literal>[AEIOU]Y</> means that the last letter of the word + is <literal>"y"</> and the penultimate letter is <literal>"a"</>, + <literal>"e"</>, <literal>"i"</>, <literal>"o"</> or <literal>"u"</>. + <literal>[^EY]</> means that the last letter is neither <literal>"e"</> + nor <literal>"y"</>. + </para> + + <para> Ispell dictionaries support splitting compound words; a useful feature. Notice that the affix file should specify a special flag using the @@ -2663,6 +2736,65 @@ SELECT ts_lexize('norwegian_ispell', 'sjokoladefabrikk'); </programlisting> </para> + <para> + <application>MySpell</> format is a subset of <application>Hunspell</>. + The <filename>.affix</> file of <application>Hunspell</> has the following + structure: +<programlisting> +PFX A Y 1 +PFX A 0 re . +SFX T N 4 +SFX T 0 st e +SFX T y iest [^aeiou]y +SFX T 0 est [aeiou]y +SFX T 0 est [^ey] +</programlisting> + </para> + + <para> + The first line of an affix class is the header. Fields of an affix rules are + listed after the header: + </para> + <itemizedlist spacing="compact" mark="bullet"> + <listitem> + <para> + parameter name (PFX or SFX) + </para> + </listitem> + <listitem> + <para> + flag (name of the affix class) + </para> + </listitem> + <listitem> + <para> + stripping characters from beginning (at prefix) or end (at suffix) of the + word + </para> + </listitem> + <listitem> + <para> + adding affix + </para> + </listitem> + <listitem> + <para> + condition that has a format similar to the format of regular expressions. + </para> + </listitem> + </itemizedlist> + + <para> + The <filename>.dict</> file looks like the <filename>.dict</> file of + <application>Ispell</>: +<programlisting> +larder/M +lardy/RT +large/RSPMYT +largehearted +</programlisting> + </para> + <note> <para> <application>MySpell</> does not support compound words. |
