Disallow set-returning functions inside CASE or COALESCE.

When we reimplemented SRFs in commit 69f4b9c85, our initial choice was to allow the behavior to vary from historical practice in cases where a SRF call appeared within a conditional-execution construct (currently, only CASE or COALESCE). But that was controversial to begin with, and subsequent discussion has resulted in a consensus that it's better to throw an error instead of executing the query differently from before, so long as we can provide a reasonably clear error message and a way to rewrite the query. Hence, add a parser mechanism to allow detection of such cases during parse analysis. The mechanism just requires storing, in the ParseState, a pointer to the set-returning FuncExpr or OpExpr most recently emitted by parse analysis. Then the parsing functions for CASE and COALESCE can detect the presence of a SRF in their arguments by noting whether this pointer changes while analyzing their arguments. Furthermore, if it does, it provides a suitable error cursor location for the complaint. (This means that if there's more than one SRF in the arguments, the error will point at the last one to be analyzed not the first. While connoisseurs of parsing behavior might find that odd, it's unlikely the average user would ever notice.) While at it, we can also provide more specific error messages than before about some pre-existing restrictions, such as no-SRFs-within-aggregates. Also, reject at parse time cases where a NULLIF or IS DISTINCT FROM construct would need to return a set. We've never supported that, but the restriction is depended on in more subtle ways now, so it seems wise to detect it at the start. Also, provide some documentation about how to rewrite a SRF-within-CASE query using a custom wrapper SRF. It turns out that the information_schema.user_mapping_options view contained an instance of exactly the behavior we're now forbidding; but rewriting it makes it more clear and safer too. initdb forced because of user_mapping_options change. Patch by me, with error message suggestions from Alvaro Herrera and Andres Freund, pursuant to a complaint from Regina Obe. Discussion: https://postgr.es/m/000001d2d5de$d8d66170$8a832450$@pcorp.us
author: Tom Lane 2017-06-14 03:46:39 +0000
committer: Tom Lane 2017-06-14 03:46:39 +0000
commit: 0436f6bde8848b7135f19dd7f8548b8c2ae89a34 (patch)
tree: 1c5a41e86292abb39ea1d33ea1d301a72e4f850e /doc/src
parent: 39da0f709db4d9f16f46be56ae401df72aab93c0 (diff)
1 files changed, 58 insertions, 25 deletions
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 05f4312bf3..1a6c3b9bc2 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -998,6 +998,29 @@ SELECT name, listchildren(name) FROM nodes;
     </para>
 
     <para>
+     <productname>PostgreSQL</>'s behavior for a set-returning function in a
+     query's select list is almost exactly the same as if the set-returning
+     function had been written in a <literal>LATERAL FROM</>-clause item
+     instead.  For example,
+<programlisting>
+SELECT x, generate_series(1,5) AS g FROM tab;
+</programlisting>
+     is almost equivalent to
+<programlisting>
+SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g;
+</programlisting>
+     It would be exactly the same, except that in this specific example,
+     the planner could choose to put <structname>g</> on the outside of the
+     nestloop join, since <structname>g</> has no actual lateral dependency
+     on <structname>tab</>.  That would result in a different output row
+     order.  Set-returning functions in the select list are always evaluated
+     as though they are on the inside of a nestloop join with the rest of
+     the <literal>FROM</> clause, so that the function(s) are run to
+     completion before the next row from the <literal>FROM</> clause is
+     considered.
+    </para>
+
+    <para>
      If there is more than one set-returning function in the query's select
      list, the behavior is similar to what you get from putting the functions
      into a single <literal>LATERAL ROWS FROM( ... )</> <literal>FROM</>-clause
@@ -1028,32 +1051,19 @@ SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM tab;
     </para>
 
     <para>
-     This behavior also means that set-returning functions will be evaluated
-     even when it might appear that they should be skipped because of a
-     conditional-evaluation construct, such as <literal>CASE</>
-     or <literal>COALESCE</>.  For example, consider
+     Set-returning functions cannot be used within conditional-evaluation
+     constructs, such as <literal>CASE</> or <literal>COALESCE</>.  For
+     example, consider
 <programlisting>
 SELECT x, CASE WHEN x &gt; 0 THEN generate_series(1, 5) ELSE 0 END FROM tab;
 </programlisting>
-     It might seem that this should produce five repetitions of input
-     rows that have <literal>x &gt; 0</>, and a single repetition of those
-     that do not; but actually it will produce five repetitions of every
-     input row.  This is because <function>generate_series()</> is run first,
-     and then the <literal>CASE</> expression is applied to its result rows.
-     The behavior is thus comparable to
-<programlisting>
-SELECT x, CASE WHEN x &gt; 0 THEN g ELSE 0 END
-  FROM tab, LATERAL generate_series(1,5) AS g;
-</programlisting>
-     It would be exactly the same, except that in this specific example,
-     the planner could choose to put <structname>g</> on the outside of the
-     nestloop join, since <structname>g</> has no actual lateral dependency
-     on <structname>tab</>.  That would result in a different output row
-     order.  Set-returning functions in the select list are always evaluated
-     as though they are on the inside of a nestloop join with the rest of
-     the <literal>FROM</> clause, so that the function(s) are run to
-     completion before the next row from the <literal>FROM</> clause is
-     considered.
+     It might seem that this should produce five repetitions of input rows
+     that have <literal>x &gt; 0</>, and a single repetition of those that do
+     not; but actually, because <function>generate_series(1, 5)</> would be
+     run in an implicit <literal>LATERAL FROM</> item before
+     the <literal>CASE</> expression is ever evaluated, it would produce five
+     repetitions of every input row.  To reduce confusion, such cases produce
+     a parse-time error instead.
     </para>
 
     <note>
@@ -1078,11 +1088,34 @@ SELECT x, CASE WHEN x &gt; 0 THEN g ELSE 0 END
       functions.  Also, nested set-returning functions did not work as
       described above; instead, a set-returning function could have at most
       one set-returning argument, and each nest of set-returning functions
-      was run independently.  The behavior for conditional execution
-      (set-returning functions inside <literal>CASE</> etc) was different too.
+      was run independently.  Also, conditional execution (set-returning
+      functions inside <literal>CASE</> etc) was previously allowed,
+      complicating things even more.
       Use of the <literal>LATERAL</> syntax is recommended when writing
       queries that need to work in older <productname>PostgreSQL</> versions,
       because that will give consistent results across different versions.
+      If you have a query that is relying on conditional execution of a
+      set-returning function, you may be able to fix it by moving the
+      conditional test into a custom set-returning function.  For example,
+<programlisting>
+SELECT x, CASE WHEN y &gt; 0 THEN generate_series(1, z) ELSE 5 END FROM tab;
+</programlisting>
+      could become
+<programlisting>
+CREATE FUNCTION case_generate_series(cond bool, start int, fin int, els int)
+  RETURNS SETOF int AS $$
+BEGIN
+  IF cond THEN
+    RETURN QUERY SELECT generate_series(start, fin);
+  ELSE
+    RETURN QUERY SELECT els;
+  END IF;
+END$$ LANGUAGE plpgsql;
+
+SELECT x, case_generate_series(y &gt; 0, 1, z, 5) FROM tab;
+</programlisting>
+      This formulation will work the same in all versions
+      of <productname>PostgreSQL</>.
      </para>
     </note>
    </sect2>
author	Tom Lane	2017-06-14 03:46:39 +0000
committer	Tom Lane	2017-06-14 03:46:39 +0000
commit	0436f6bde8848b7135f19dd7f8548b8c2ae89a34 (patch)
tree	1c5a41e86292abb39ea1d33ea1d301a72e4f850e /doc/src
parent	39da0f709db4d9f16f46be56ae401df72aab93c0 (diff)