summaryrefslogtreecommitdiff
path: root/contrib/unaccent
AgeCommit message (Collapse)Author
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-03-12Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch
This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
2017-01-21Move some things from builtins.h to new header filesPeter Eisentraut
This avoids that builtins.h has to include additional header files.
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-06-14Update unaccent extension for parallel query.Robert Haas
All functions provided by this extension are PARALLEL SAFE. Andreas Karlsson
2016-03-16fix typo in commentTeodor Sigaev
2016-03-16Improve script generating unaccent rulesTeodor Sigaev
Script now use the standard Unicode transliterator Latin-ASCII. Author: Leonard Benedetti
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-09-04Make unaccent handle all diacritics known to Unicode, and expand ligatures ↵Teodor Sigaev
correctly Add Python script for buiding unaccent.rules from Unicode data. Don't backpatch because unaccent changes may require tsvector/index rebuild. Thomas Munro <thomas.munro@enterprisedb.com>
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-08-25Fix typos in some error messages thrown by extension scripts when fed to psql.Andres Freund
Some of the many error messages introduced in 458857cc missed 'FROM unpackaged'. Also e016b724 and 45ffeb7e forgot to quote extension version numbers. Backpatch to 9.1, just like 458857cc which introduced the messages. Do so because the error messages thrown when the wrong command is copy & pasted aren't easy to understand.
2014-07-14Add file version information to most installed Windows binaries.Noah Misch
Prominent binaries already had this metadata. A handful of minor binaries, such as pg_regress.exe, still lack it; efforts to eliminate such exceptions are welcome. Michael Paquier, reviewed by MauMau.
2014-07-01Fix inadequately-sized output buffer in contrib/unaccent.Tom Lane
The output buffer size in unaccent_lexize() was calculated as input string length times pg_database_encoding_max_length(), which effectively assumes that replacement strings aren't more than one character. While that was all that we previously documented it to support, the code actually has always allowed replacement strings of arbitrary length; so if you tried to make use of longer strings, you were at risk of buffer overrun. To fix, use an expansible StringInfo buffer instead of trying to determine the maximum space needed a-priori. This would be a security issue if unaccent rules files could be installed by unprivileged users; but fortunately they can't, so in the back branches the problem can be labeled as improper configuration by a superuser. Nonetheless, a memory stomp isn't a nice way of reacting to improper configuration, so let's back-patch the fix.
2014-07-01Issue a WARNING about invalid rule file format in contrib/unaccent.Tom Lane
We were already issuing a WARNING, albeit only elog not ereport, for duplicate source strings; so warning rather than just being stoically silent seems like the best thing to do here. Arguably both of these complaints should be upgraded to ERRORs, but that might be more behavioral change than people want. Note: the faulty line is already printed via an errcontext hook, so there's no need for more information than these messages provide.
2014-07-01Allow multi-character source strings in contrib/unaccent.Tom Lane
This could be useful in languages where diacritic signs are represented as separate characters; more generally it supports using unaccent dictionaries for substring substitutions beyond narrowly conceived "diacritic removal". In any case, since the rule-file parser doesn't complain about multi-character source strings, it behooves us to do something unsurprising with them.
2014-07-01Allow empty replacement strings in contrib/unaccent.Tom Lane
This is useful in languages where diacritic signs are represented as separate characters; it's also one step towards letting unaccent be used for arbitrary substring substitutions. In passing, improve the user documentation for unaccent, which was sadly vague about some important details. Mohammad Alhashash, reviewed by Abhijit Menon-Sen
2014-04-18Create function prototype as part of PG_FUNCTION_INFO_V1 macroPeter Eisentraut
Because of gcc -Wmissing-prototypes, all functions in dynamically loadable modules must have a separate prototype declaration. This is meant to detect global functions that are not declared in header files, but in cases where the function is called via dfmgr, this is redundant. Besides filling up space with boilerplate, this is a frequent source of compiler warnings in extension modules. We can fix that by creating the function prototype as part of the PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That makes the code of modules cleaner, because there is one less place where the entry points have to be listed, and creates an additional check that functions have the right prototype. Remove now redundant prototypes from contrib and other modules.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-11-18unaccent: Revert patch 9299f6179838cef8aa1123f6fb76f0d3d6f2deccBruce Momjian
The reverted patch to change functions from strict to immutable was incorrect and needs additional research.
2013-10-08unaccent: mark unaccent() functions as immutableBruce Momjian
Suggestion from Pavel Stehule
2013-05-29pgindent run for release 9.3Bruce Momjian
This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
2013-05-08The data structure used in unaccent is a trie, not suffix tree.Heikki Linnakangas
Fix the term used in variable and struct names, and comments. Alexander Korotkov
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-04-22Fix some typosPeter Eisentraut
Josh Kupershmidt
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-11-07Fix assorted bugs in contrib/unaccent's configuration file parsing.Tom Lane
Make it use t_isspace() to identify whitespace, rather than relying on sscanf which is known to get it wrong on some platform/locale combinations. Get rid of fixed-size buffers. Make it actually continue to parse the file after ignoring a line with untranslatable characters, as was obviously intended. The first of these issues is per gripe from J Smith, though not exactly either of his proposed patches.
2011-10-12Throw a useful error message if an extension script file is fed to psql.Tom Lane
We have seen one too many reports of people trying to use 9.1 extension files in the old-fashioned way of sourcing them in psql. Not only does that usually not work (due to failure to substitute for MODULE_PATHNAME and/or @extschema@), but if it did work they'd get a collection of loose objects not an extension. To prevent this, insert an \echo ... \quit line that prints a suitable error message into each extension script file, and teach commands/extension.c to ignore lines starting with \echo. That should not only prevent any adverse consequences of loading a script file the wrong way, but make it crystal clear to users that they need to do it differently now. Tom Lane, following an idea of Andrew Dunstan's. Back-patch into 9.1 ... there is not going to be much value in this if we wait till 9.2.
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-04-25Support "make check" in contribPeter Eisentraut
Added a new option --extra-install to pg_regress to arrange installing the respective contrib directory into the temporary installation. This is currently not yet supported for Windows MSVC builds. Updated the .gitignore files for contrib modules to ignore the leftovers of a temp-install check run. Changed the exit status of "make check" in a pgxs build (which still does nothing) to 0 from 1. Added "make check" in contrib to top-level "make check-world".
2011-04-19Refix the unaccent regression test on MSVC properlyPeter Eisentraut
... for some value of "properly". Instead of overriding REGRESS_OPTS, set the variables ENCODING and NO_LOCALE, which is more expressive and allows overriding by the user. Fix vcregress.pl to handle that.
2011-04-18Attempt to remedy buildfarm breakage caused by commit f536d4194.Andrew Dunstan
2011-04-15Rename pg_regress option --multibyte to --encodingPeter Eisentraut
Also refactor things a little bit so that the same methods for setting test locale and encoding can be used everywhere.
2011-02-17Fix upgrade of contrib/intarray and contrib/unaccent from 9.0.Tom Lane
Take care of a couple of discrepancies between what you get from a fresh install and what the first-draft update-from-unpackaged scripts produced.
2011-02-14Avoid use of CREATE OR REPLACE FUNCTION in extension installation files.Tom Lane
It was never terribly consistent to use OR REPLACE (because of the lack of comparable functionality for data types, operators, etc), and experimentation shows that it's now positively pernicious in the extension world. We really want a failure to occur if there are any conflicts, else it's unclear what the extension-ownership state of the conflicted object ought to be. Most of the time, CREATE EXTENSION will fail anyway because of conflicts on other object types, but an extension defining only functions can succeed, with bad results.
2011-02-14Convert contrib modules to use the extension facility.Tom Lane
This isn't fully tested as yet, in particular I'm not sure that the "foo--unpackaged--1.0.sql" scripts are OK. But it's time to get some buildfarm cycles on it. sepgsql is not converted to an extension, mainly because it seems to require a very nonstandard installation process. Dimitri Fontaine and Tom Lane
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-12-27Mark unaccent functions as STABLE, rather than defaulting to VOLATILE.Bruce Momjian
2010-11-23Remove useless whitespace at end of linesPeter Eisentraut
2010-09-22Some more gitignore cleanups: cover contrib and PL regression test outputs.Tom Lane
Also do some further work in the back branches, where quite a bit wasn't covered by Magnus' original back-patch.
2010-09-22Convert cvsignore to gitignore, and add .gitignore for build targets.Magnus Hagander
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-08-05Standardize get_whatever_oid functions for other object types.Robert Haas
- Rename TSParserGetPrsid to get_ts_parser_oid. - Rename TSDictionaryGetDictid to get_ts_dict_oid. - Rename TSTemplateGetTmplid to get_ts_template_oid. - Rename TSConfigGetCfgid to get_ts_config_oid. - Rename FindConversionByName to get_conversion_oid. - Rename GetConstraintName to get_constraint_oid. - Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid, get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid. The name of each function matches the corresponding catalog. Thanks to KaiGai Kohei for the review.
2010-02-26pgindent run for 9.0Bruce Momjian
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-11-14Make unaccent's install/uninstall scripts look more like all the others.Tom Lane
Set search_path explicitly, don't use IF EXISTS, etc.
2009-08-18Print the actual DB encoding in the unaccent regression test.Tom Lane
This is to help make it more obvious what the problem is, if the encoding isn't what the test expects.
2009-08-18Fix some *other* compiler warnings from a different gcc version.Tom Lane
2009-08-18Fix copy-and-pasteo that might explain some of the buildfarm'sTom Lane
indigestion about this module.
2009-08-18Suppress compiler warnings about uninitialized variables.Tom Lane
2009-08-18Unaccent dictionary.Teodor Sigaev