diff options
author | Tom Lane | 2017-06-14 03:46:39 +0000 |
---|---|---|
committer | Tom Lane | 2017-06-14 03:46:39 +0000 |
commit | 0436f6bde8848b7135f19dd7f8548b8c2ae89a34 (patch) | |
tree | 1c5a41e86292abb39ea1d33ea1d301a72e4f850e /doc/src | |
parent | 39da0f709db4d9f16f46be56ae401df72aab93c0 (diff) |
Disallow set-returning functions inside CASE or COALESCE.
When we reimplemented SRFs in commit 69f4b9c85, our initial choice was
to allow the behavior to vary from historical practice in cases where a
SRF call appeared within a conditional-execution construct (currently,
only CASE or COALESCE). But that was controversial to begin with, and
subsequent discussion has resulted in a consensus that it's better to
throw an error instead of executing the query differently from before,
so long as we can provide a reasonably clear error message and a way to
rewrite the query.
Hence, add a parser mechanism to allow detection of such cases during
parse analysis. The mechanism just requires storing, in the ParseState,
a pointer to the set-returning FuncExpr or OpExpr most recently emitted
by parse analysis. Then the parsing functions for CASE and COALESCE can
detect the presence of a SRF in their arguments by noting whether this
pointer changes while analyzing their arguments. Furthermore, if it does,
it provides a suitable error cursor location for the complaint. (This
means that if there's more than one SRF in the arguments, the error will
point at the last one to be analyzed not the first. While connoisseurs of
parsing behavior might find that odd, it's unlikely the average user would
ever notice.)
While at it, we can also provide more specific error messages than before
about some pre-existing restrictions, such as no-SRFs-within-aggregates.
Also, reject at parse time cases where a NULLIF or IS DISTINCT FROM
construct would need to return a set. We've never supported that, but the
restriction is depended on in more subtle ways now, so it seems wise to
detect it at the start.
Also, provide some documentation about how to rewrite a SRF-within-CASE
query using a custom wrapper SRF.
It turns out that the information_schema.user_mapping_options view
contained an instance of exactly the behavior we're now forbidding; but
rewriting it makes it more clear and safer too.
initdb forced because of user_mapping_options change.
Patch by me, with error message suggestions from Alvaro Herrera and
Andres Freund, pursuant to a complaint from Regina Obe.
Discussion: https://postgr.es/m/000001d2d5de$d8d66170$8a832450$@pcorp.us
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/xfunc.sgml | 83 |
1 files changed, 58 insertions, 25 deletions
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index 05f4312bf3..1a6c3b9bc2 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -998,6 +998,29 @@ SELECT name, listchildren(name) FROM nodes; </para> <para> + <productname>PostgreSQL</>'s behavior for a set-returning function in a + query's select list is almost exactly the same as if the set-returning + function had been written in a <literal>LATERAL FROM</>-clause item + instead. For example, +<programlisting> +SELECT x, generate_series(1,5) AS g FROM tab; +</programlisting> + is almost equivalent to +<programlisting> +SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g; +</programlisting> + It would be exactly the same, except that in this specific example, + the planner could choose to put <structname>g</> on the outside of the + nestloop join, since <structname>g</> has no actual lateral dependency + on <structname>tab</>. That would result in a different output row + order. Set-returning functions in the select list are always evaluated + as though they are on the inside of a nestloop join with the rest of + the <literal>FROM</> clause, so that the function(s) are run to + completion before the next row from the <literal>FROM</> clause is + considered. + </para> + + <para> If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions into a single <literal>LATERAL ROWS FROM( ... )</> <literal>FROM</>-clause @@ -1028,32 +1051,19 @@ SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM tab; </para> <para> - This behavior also means that set-returning functions will be evaluated - even when it might appear that they should be skipped because of a - conditional-evaluation construct, such as <literal>CASE</> - or <literal>COALESCE</>. For example, consider + Set-returning functions cannot be used within conditional-evaluation + constructs, such as <literal>CASE</> or <literal>COALESCE</>. For + example, consider <programlisting> SELECT x, CASE WHEN x > 0 THEN generate_series(1, 5) ELSE 0 END FROM tab; </programlisting> - It might seem that this should produce five repetitions of input - rows that have <literal>x > 0</>, and a single repetition of those - that do not; but actually it will produce five repetitions of every - input row. This is because <function>generate_series()</> is run first, - and then the <literal>CASE</> expression is applied to its result rows. - The behavior is thus comparable to -<programlisting> -SELECT x, CASE WHEN x > 0 THEN g ELSE 0 END - FROM tab, LATERAL generate_series(1,5) AS g; -</programlisting> - It would be exactly the same, except that in this specific example, - the planner could choose to put <structname>g</> on the outside of the - nestloop join, since <structname>g</> has no actual lateral dependency - on <structname>tab</>. That would result in a different output row - order. Set-returning functions in the select list are always evaluated - as though they are on the inside of a nestloop join with the rest of - the <literal>FROM</> clause, so that the function(s) are run to - completion before the next row from the <literal>FROM</> clause is - considered. + It might seem that this should produce five repetitions of input rows + that have <literal>x > 0</>, and a single repetition of those that do + not; but actually, because <function>generate_series(1, 5)</> would be + run in an implicit <literal>LATERAL FROM</> item before + the <literal>CASE</> expression is ever evaluated, it would produce five + repetitions of every input row. To reduce confusion, such cases produce + a parse-time error instead. </para> <note> @@ -1078,11 +1088,34 @@ SELECT x, CASE WHEN x > 0 THEN g ELSE 0 END functions. Also, nested set-returning functions did not work as described above; instead, a set-returning function could have at most one set-returning argument, and each nest of set-returning functions - was run independently. The behavior for conditional execution - (set-returning functions inside <literal>CASE</> etc) was different too. + was run independently. Also, conditional execution (set-returning + functions inside <literal>CASE</> etc) was previously allowed, + complicating things even more. Use of the <literal>LATERAL</> syntax is recommended when writing queries that need to work in older <productname>PostgreSQL</> versions, because that will give consistent results across different versions. + If you have a query that is relying on conditional execution of a + set-returning function, you may be able to fix it by moving the + conditional test into a custom set-returning function. For example, +<programlisting> +SELECT x, CASE WHEN y > 0 THEN generate_series(1, z) ELSE 5 END FROM tab; +</programlisting> + could become +<programlisting> +CREATE FUNCTION case_generate_series(cond bool, start int, fin int, els int) + RETURNS SETOF int AS $$ +BEGIN + IF cond THEN + RETURN QUERY SELECT generate_series(start, fin); + ELSE + RETURN QUERY SELECT els; + END IF; +END$$ LANGUAGE plpgsql; + +SELECT x, case_generate_series(y > 0, 1, z, 5) FROM tab; +</programlisting> + This formulation will work the same in all versions + of <productname>PostgreSQL</>. </para> </note> </sect2> |