Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Lists: pgsql-hackers
From: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-03 19:52:48
Message-ID: 0184212d-1248-4f1f-a42d-f5cb1c1976d2@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello, hackers! I with my friends propose the patch to replace IN VALUES
to ANY in WHERE clauses.

# Intro

The `VALUES` in the `IN VALUES` construct is replaced with with an array
of values when `VALUES` contains 1 column. In the end it will be
replaced with ANY by the existing function makeA_Expr
(src/backend/nodes/makefuncs.c)

This improves performance, especially if the values are small.

# Patch

v1-in_values_to_array.patch

# How realized

`VALUES` statement corresponds to `values_clause` nonterminal symbol in
gram.y, where it's parsed to `SelectStmt` node.

`IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1
column, parser extracts data from `SelectStmt` and passes it

to function call `makeSimpleA_Expr` where simple `A_Expr` is created.

Later during optimizations of parser tree this `A_Expr` will be
transformed to `ArrayExpr` (already realized in Postgres)

# Authors.
Author: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
Author: Vadim Yacenko <vadim(dot)yacenko(at)tantorlabs(dot)com>
Author: Alexander Simonov <alexander(dot)simonov(at)tantorlabs(dot)com>

# Tests
Implementation contains many regression tests of varying complexity,
which check supported features.

# Platform
This patch was checkouted from tag REL_17_STABLE. Code is developed in
Linux, doesn't contain platfrom-specific code, only Postgres internal
data structures and functions.

# Documentation
Regression tests contain many examples

# Performance
It increases performance

# Example
Let's compare result. With path the execution time is significantly lower.

We have a table table1 with 10000 rows.

postgres=# \d table1;
                         Table "public.table1"
 Column |            Type             | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
 fld1   | timestamp without time zone |           | not null |
 fld2   | bytea                       |           | not null |
Indexes:
    "table1index" btree (fld2)

Let's execute several commands
see commands.sql

Plan no patch
see plan_no_patch.txt

Plan with patch
see plan_with_patch.txt

--
Best wishes,
Ivan Kush
Tantor Labs LLC

Attachment Content-Type Size
commands.sql application/sql 2.0 KB
plan_no_patch.txt text/plain 4.1 KB
plan_with_patch.txt text/plain 4.0 KB
v1-in_values_to_any.patch text/x-patch 28.7 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-03 20:10:25
Message-ID: 843f8d10-0a84-478b-b14e-9e8e29024634@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On 03.10.2024 22:52, Ivan Kush wrote:
>
> Hello, hackers! I with my friends propose the patch to replace IN
> VALUES to ANY in WHERE clauses.
>
> # Intro
>
> The `VALUES` in the `IN VALUES` construct is replaced with with an
> array of values when `VALUES` contains 1 column. In the end it will be
> replaced with ANY by the existing function makeA_Expr
> (src/backend/nodes/makefuncs.c)
>
> This improves performance, especially if the values are small.
>
> # Patch
>
> v1-in_values_to_array.patch
>
> # How realized
>
> `VALUES` statement corresponds to `values_clause` nonterminal symbol
> in gram.y, where it's parsed to `SelectStmt` node.
>
> `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1
> column, parser extracts data from `SelectStmt` and passes it
>
> to function call `makeSimpleA_Expr` where simple `A_Expr` is created.
>
> Later during optimizations of parser tree this `A_Expr` will be
> transformed to `ArrayExpr` (already realized in Postgres)
>
>
> # Authors.
> Author: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
> Author: Vadim Yacenko <vadim(dot)yacenko(at)tantorlabs(dot)com>
> Author: Alexander Simonov <alexander(dot)simonov(at)tantorlabs(dot)com>
>
> # Tests
> Implementation contains many regression tests of varying complexity,
> which check supported features.
>
> # Platform
> This patch was checkouted from tag REL_17_STABLE. Code is developed in
> Linux, doesn't contain platfrom-specific code, only Postgres internal
> data structures and functions.
>
> # Documentation
> Regression tests contain many examples
>
> # Performance
> It increases performance
>
> # Example
> Let's compare result. With path the execution time is significantly
> lower.
>
> We have a table table1 with 10000 rows.
>
> postgres=# \d table1;
>                          Table "public.table1"
>  Column |            Type             | Collation | Nullable | Default
> --------+-----------------------------+-----------+----------+---------
>  fld1   | timestamp without time zone |           | not null |
>  fld2   | bytea                       |           | not null |
> Indexes:
>     "table1index" btree (fld2)
>
> Let's execute several commands
> see commands.sql
>
> Plan no patch
> see plan_no_patch.txt
>
>
> Plan with patch
> see plan_with_patch.txt

I think you should think about putting these constants in ANY Array

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from t
    where x in (VALUES(1200), (1));
                    QUERY PLAN
---------------------------------------------------
 Seq Scan on t (actual rows=1 loops=1)
   Filter: (x = ANY ('{1200,1}'::integer[]))
(3 rows)

Anlrey Lepikhov and I recently described this in an article [0] here and
the implementation already exists, but for now it was posted a binary
application for testing. The acceleration is significant I agree.

[0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

--
Regards,
Alena Rybakina
Postgres Professional


From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-03 20:19:56
Message-ID: e20ee75bb2713f6141e5ac1333a390d603b157e6.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:
> On 03.10.2024 22:52, Ivan Kush wrote:
> >
> > Hello, hackers! I with my friends propose the patch to replace IN
> > VALUES to ANY in WHERE clauses.
> >
> > # Intro
> >
> > The `VALUES` in the `IN VALUES` construct is replaced with with an
> > array of values when `VALUES` contains 1 column. In the end it will be
> > replaced with ANY by the existing function makeA_Expr
> > (src/backend/nodes/makefuncs.c)
> >
> > This improves performance, especially if the values are small.
>
> Anlrey Lepikhov and I recently described this in an article [0] here and
> the implementation already exists, but for now it was posted a binary
> application for testing. The acceleration is significant I agree.
>
> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

I believe that the speed improvement is significant, but who writes a
query like

... WHERE col IN (VALUES (1), (2), (3))

when they could write the much shorter

... WHERE col IN (1, 2, 3)

which is already converted to "= ANY"?

I wonder if it is worth the extra planning time to detect and improve
such queries.

Yours,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-03 21:08:28
Message-ID: 461998.1727989708@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> writes:
> I wonder if it is worth the extra planning time to detect and improve
> such queries.

I'm skeptical too. I'm *very* skeptical of implementing it in the
grammar as shown here --- I'd go so far as to say that that approach
cannot be accepted. That's far too early, and it risks all sorts
of problems. An example is that the code as given seems to assume
that all the sublists are the same length ... but we haven't checked
that yet. I also suspect that this does not behave the same as the
original construct for purposes like resolving dissimilar types in
the VALUES list. (In an ideal world, perhaps it'd behave the same,
but that ship sailed a couple decades ago.)

regards, tom lane


From: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-04 08:43:28
Message-ID: 715d78b4-37f8-4de5-b8ac-f179f812edf3@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Do you mean, that I should try to execute such command?

In this patch it gives ANY

postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from table1
   where fld2 in
(VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea),
('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
a));
                                                                                                                             QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

Bitmap Heap Scan on table1 (actual rows=0 loops=1)
  Recheck Cond: (fld2 = ANY
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))

  ->  Bitmap Index Scan on table1index (actual rows=0 loops=1)
        Index Cond: (fld2 = ANY
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))

(4 rows)

Do you plan to send your implementation to the hackers?

On 10/3/24 23:10, Alena Rybakina wrote:
> I think you should think about putting these constants in ANY Array
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> select * from t
>     where x in (VALUES(1200), (1));
>                     QUERY PLAN
> ---------------------------------------------------
>  Seq Scan on t (actual rows=1 loops=1)
>    Filter: (x = ANY ('{1200,1}'::integer[]))
> (3 rows)

--
Best wishes,
Ivan Kush
Tantor Labs LLC


From: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-04 08:48:04
Message-ID: 42f7fba2-c2e5-4425-a836-6700a4e58ee5@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Some ORMs or proprietary software may write it mistakenly. In these
cases this idea may be helpful.

This patch contains GUC to enable/disable this optimization

On 10/3/24 23:19, Laurenz Albe wrote:
> On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:
>> On 03.10.2024 22:52, Ivan Kush wrote:
>>> Hello, hackers! I with my friends propose the patch to replace IN
>>> VALUES to ANY in WHERE clauses.
>>>
>>> # Intro
>>>
>>> The `VALUES` in the `IN VALUES` construct is replaced with with an
>>> array of values when `VALUES` contains 1 column. In the end it will be
>>> replaced with ANY by the existing function makeA_Expr
>>> (src/backend/nodes/makefuncs.c)
>>>
>>> This improves performance, especially if the values are small.
>> Anlrey Lepikhov and I recently described this in an article [0] here and
>> the implementation already exists, but for now it was posted a binary
>> application for testing. The acceleration is significant I agree.
>>
>> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665
> I believe that the speed improvement is significant, but who writes a
> query like
>
> ... WHERE col IN (VALUES (1), (2), (3))
>
> when they could write the much shorter
>
> ... WHERE col IN (1, 2, 3)
>
> which is already converted to "= ANY"?
>
> I wonder if it is worth the extra planning time to detect and improve
> such queries.
>
> Yours,
> Laurenz Albe

--
Best wishes,
Ivan Kush
Tantor Labs LLC


From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-04 09:05:01
Message-ID: 21d5fca5-0c02-4afd-8c98-d0930b298a8d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/4/24 04:08, Tom Lane wrote:
> Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> writes:
>> I wonder if it is worth the extra planning time to detect and improve
>> such queries.
>
> I'm skeptical too. I'm *very* skeptical of implementing it in the
> grammar as shown here --- I'd go so far as to say that that approach
> cannot be accepted. That's far too early, and it risks all sorts
> of problems. An example is that the code as given seems to assume
> that all the sublists are the same length ... but we haven't checked
> that yet. I also suspect that this does not behave the same as the
> original construct for purposes like resolving dissimilar types in
> the VALUES list. (In an ideal world, perhaps it'd behave the same,
> but that ship sailed a couple decades ago.)
We also have an implementation of VALUES -> ARRAY transformation.
Because enterprises must deal with users' problems, many of these users
employ automatically generated queries.
Being informed very well of the consensus about that stuff, we've
designed it as a library. But, looking into the code now, I see that it
only needs a few cycles if no one 'x IN VALUES' expression is presented
in the query. Who knows? It may be OK for the core.
So, I've rewritten the code into the patch - see it in the attachment.

The idea is quite simple - at the same place as
convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
RTE and perform the transformation if it's convertible.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
0001-Introduce-VALUES-ARRAY-transformation.patch text/x-patch 30.3 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-04 09:15:56
Message-ID: 02863b21-b88e-46fc-8937-fd9b0401b0c8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.10.2024 11:43, Ivan Kush wrote:
> Do you mean, that I should try to execute such command?
>
> In this patch it gives ANY
>
> postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> select * from table1
>    where fld2 in
> (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea),
> ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
> a));
>                                                                                                                              QUERY
> PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> --------------------------------------------------------------------------------------------
>
> Bitmap Heap Scan on table1 (actual rows=0 loops=1)
>   Recheck Cond: (fld2 = ANY
> ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
> 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))
>
>   ->  Bitmap Index Scan on table1index (actual rows=0 loops=1)
>         Index Cond: (fld2 = ANY
> ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
> 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))
>
> (4 rows)
Yes I meant it.
>
> Do you plan to send your implementation to the hackers?
>
It was sent here [0].

[0]
https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com

--
Regards,
Alena Rybakina
Postgres Professional


From: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, lepihov(at)gmail(dot)com
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-11-11 17:29:14
Message-ID: 73b47d4b-11e7-4627-8c93-ffb0376dc0d1@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I agree, your realization is better: reliability is better and debugging
is simplier.
I've looked at the code, looks good to me. Only style notes like
VTA/VtA, SELECT/select, etc. may be corrected

On 10/4/24 12:15, Alena Rybakina wrote:
>
> It was sent here [0].
>
> [0]
> https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com
>
--
Best wishes,
Ivan Kush
Tantor Labs LLC


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-01-09 13:10:51
Message-ID: f2faa7a5-979c-4367-bcf7-bdbe2a05a397@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On 04.10.2024 12:05, Andrei Lepikhov wrote:
> We also have an implementation of VALUES -> ARRAY transformation.
> Because enterprises must deal with users' problems, many of these
> users employ automatically generated queries.
> Being informed very well of the consensus about that stuff, we've
> designed it as a library. But, looking into the code now, I see that
> it only needs a few cycles if no one 'x IN VALUES' expression is
> presented in the query. Who knows? It may be OK for the core.
> So, I've rewritten the code into the patch - see it in the attachment.
>
> The idea is quite simple - at the same place as
> convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
> RTE and perform the transformation if it's convertible.

I updated the patch due to the problem with the coercion types for both
sides of the expression.

We must find a common type for both leftop of the expression and rightop
including constants for correct transformation, and at the same time
check that the resulting types are compatible.

To do this we find an operator for the two input types if it is
possible, and also remember the target types for the left and right
sides, and after that make a coercion.

This processing is only needed in cases where we are not working with
parameters since the final type is not specified for the parameters.

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
v2-0001-Add-an-implementation-of-the-x-IN-VALUES-to-x-ANY.patch text/x-patch 36.6 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-02-09 11:58:56
Message-ID: CAPpHfdsp9qJ82csxfa-jmACTKC7mZorFOb4MGZcBVPw0623JmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alena!

On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
wrote:
> On 04.10.2024 12:05, Andrei Lepikhov wrote:
> > We also have an implementation of VALUES -> ARRAY transformation.
> > Because enterprises must deal with users' problems, many of these
> > users employ automatically generated queries.
> > Being informed very well of the consensus about that stuff, we've
> > designed it as a library. But, looking into the code now, I see that
> > it only needs a few cycles if no one 'x IN VALUES' expression is
> > presented in the query. Who knows? It may be OK for the core.
> > So, I've rewritten the code into the patch - see it in the attachment.
> >
> > The idea is quite simple - at the same place as
> > convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
> > RTE and perform the transformation if it's convertible.
>
> I updated the patch due to the problem with the coercion types for both
> sides of the expression.
>
> We must find a common type for both leftop of the expression and rightop
> including constants for correct transformation, and at the same time
> check that the resulting types are compatible.
>
> To do this we find an operator for the two input types if it is
> possible, and also remember the target types for the left and right
> sides, and after that make a coercion.
>
> This processing is only needed in cases where we are not working with
> parameters since the final type is not specified for the parameters.

I took a look at this patch.

+ /* TODO: remember parameters */

What was intended to do here?

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be
transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values
(t1.i), (1)));

------
Regards,
Alexander Korotkov
Supabase


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-02-09 15:38:03
Message-ID: CAPpHfdu6paqAt+6h90a5vS_AN3jNbY3h=iHxoOi=gyE5d1kvLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 9, 2025 at 1:58 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> > On 04.10.2024 12:05, Andrei Lepikhov wrote:
> > > We also have an implementation of VALUES -> ARRAY transformation.
> > > Because enterprises must deal with users' problems, many of these
> > > users employ automatically generated queries.
> > > Being informed very well of the consensus about that stuff, we've
> > > designed it as a library. But, looking into the code now, I see that
> > > it only needs a few cycles if no one 'x IN VALUES' expression is
> > > presented in the query. Who knows? It may be OK for the core.
> > > So, I've rewritten the code into the patch - see it in the attachment.
> > >
> > > The idea is quite simple - at the same place as
> > > convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
> > > RTE and perform the transformation if it's convertible.
> >
> > I updated the patch due to the problem with the coercion types for both
> > sides of the expression.
> >
> > We must find a common type for both leftop of the expression and rightop
> > including constants for correct transformation, and at the same time
> > check that the resulting types are compatible.
> >
> > To do this we find an operator for the two input types if it is
> > possible, and also remember the target types for the left and right
> > sides, and after that make a coercion.
> >
> > This processing is only needed in cases where we are not working with
> > parameters since the final type is not specified for the parameters.
>
> I took a look at this patch.
>
> + /* TODO: remember parameters */
>
> What was intended to do here?
>
> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
> For instance, I believe cases like this (containing Var) could be transformed too.
>
> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

Also, I think there is quite a code duplication about construction of
SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
functions. I would like to see a refactoring as a separate first
patch, which extracts the common part into a function.

------
Regards,
Alexander Korotkov
Supabase


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-02-09 19:47:26
Message-ID: 3b679503-a7be-4e00-9a75-189174e84027@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi! Thank you for your review!

On 09.02.2025 18:38, Alexander Korotkov wrote:
> On Sun, Feb 9, 2025 at 1:58 PM Alexander Korotkov<aekorotkov(at)gmail(dot)com> wrote:
>> On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>> On 04.10.2024 12:05, Andrei Lepikhov wrote:
>>>> We also have an implementation of VALUES -> ARRAY transformation.
>>>> Because enterprises must deal with users' problems, many of these
>>>> users employ automatically generated queries.
>>>> Being informed very well of the consensus about that stuff, we've
>>>> designed it as a library. But, looking into the code now, I see that
>>>> it only needs a few cycles if no one 'x IN VALUES' expression is
>>>> presented in the query. Who knows? It may be OK for the core.
>>>> So, I've rewritten the code into the patch - see it in the attachment.
>>>>
>>>> The idea is quite simple - at the same place as
>>>> convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
>>>> RTE and perform the transformation if it's convertible.
>>> I updated the patch due to the problem with the coercion types for both
>>> sides of the expression.
>>>
>>> We must find a common type for both leftop of the expression and rightop
>>> including constants for correct transformation, and at the same time
>>> check that the resulting types are compatible.
>>>
>>> To do this we find an operator for the two input types if it is
>>> possible, and also remember the target types for the left and right
>>> sides, and after that make a coercion.
>>>
>>> This processing is only needed in cases where we are not working with
>>> parameters since the final type is not specified for the parameters.
>> I took a look at this patch.
>>
>> + /* TODO: remember parameters */

This comment is not relevant anymore.This comment was added during patch
development and so it is enough to save const params in
arrayExpr->elements. I'll delete it.

Andrei did review of my last code and improved it. I'll add his code too.

>> What was intended to do here?
>>
>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>> For instance, I believe cases like this (containing Var) could be transformed too.
>>
>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I am willing to agree with you because I didn't see any limitations for
that. After analyzing diff of regression tests and your example (below),
I think I will need to add a piece of logic of preparation to pull up
the sub-select into top range table like here [0] to correct processing
vars elements based on their position in the query.

[0]
https://www.postgresql.org/message-id/975a3736-a8b5-49b3-8009-4d4e86867aa1%40postgrespro.ru

alena(at)postgres=# explain select * from t t1, lateral (select * from t t2
where t2.x in (val
ues (t1.x), (1)));
ERROR:  bogus varlevelsup: 1 offset 0

So, I'm working on it.

> Also, I think there is quite a code duplication about construction of
> SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
> functions. I would like to see a refactoring as a separate first
> patch, which extracts the common part into a function.

I completely agree with you. Ill add it.

--
Regards,
Alena Rybakina
Postgres Professional


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-02-20 21:09:27
Message-ID: d6809313-c0ca-4338-88e5-e8bc424bdfda@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:
>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>> For instance, I believe cases like this (containing Var) could be transformed too.
>>
>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
I'm still working on it.
> Also, I think there is quite a code duplication about construction of
> SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
> functions. I would like to see a refactoring as a separate first
> patch, which extracts the common part into a function.

Done.

I have attached a patch. In addition to the transfer, I added the
process of searching for a suitable operator and type for the left
expression for input expressions: const and left expression, since they
may differ from the declared types. Additionally, we convert the left
expr to a type suitable for the found operator.

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
v3-0002-Add-an-implementation-of-the-x-IN-VALUES-to-x-ANY-.-.patch text/x-patch 30.9 KB
v3-0001-Move-the-function-for-generating-ArrayExpr-to-anothe.patch text/x-patch 9.7 KB

From: newtglobal postgresql_contributors <postgresql_contributors(at)newtglobalcorp(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-02-27 11:09:02
Message-ID: 174065454278.654.1011593946047422922.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed

Hi Ivan Kush
I tested the patch with `commands.sql` and observed noticeable improvements in planning and execution time, especially with multiple tables. Even single-table queries show small time reductions (0.02–0.04 ms). The patch optimizes `IN` clauses effectively, particularly with `VALUES`. For example, `col IN (VALUES ('a'), ('b'), ('c'))` now behaves similarly to `col IN ('a', 'b', 'c')`, leading to faster execution and reduced planning overhead.
Regards,
Postgresql Contributors - NewtGlobal


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-02-28 11:48:47
Message-ID: ce45dde6-0a12-4a21-9445-1db09637992c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On 21.02.2025 00:09, Alena Rybakina wrote:
>
> Hi!
>
> On 09.02.2025 18:38, Alexander Korotkov wrote:
>>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>>> For instance, I believe cases like this (containing Var) could be transformed too.
>>>
>>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
>
I added it and attached a patch with diff file. To be honest, I didn't
find queries except for var with volatile functions where the transform
can't be applied.

I'm not sure about only cases where var can refer to something outside
available_rels list but I couldn't come up with an example where that's
possible, what do you think?

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
consider_vars.diff.no-cfbot text/plain 2.7 KB
v4-0002-Add-an-implementation-of-the-x-IN-VALUES-to-x-ANY.patch text/x-patch 31.9 KB
v4-0001-Move-the-function-for-generating-ArrayExpr-to-anothe.patch text/x-patch 9.7 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-01 11:39:11
Message-ID: 99c3d74c-56c1-4fc1-b6db-c0c87cc9cc2d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28.02.2025 14:48, Alena Rybakina wrote:
>
> Hi!
>
> On 21.02.2025 00:09, Alena Rybakina wrote:
>>
>> Hi!
>>
>> On 09.02.2025 18:38, Alexander Korotkov wrote:
>>>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>>>> For instance, I believe cases like this (containing Var) could be transformed too.
>>>>
>>>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
>>
> I added it and attached a patch with diff file. To be honest, I didn't
> find queries except for var with volatile functions where the
> transform can't be applied.
>
I removed the function volatility check that I added in the previous
version, since we already check it in is_simple_values_sequence.

> I'm not sure about only cases where var can refer to something outside
> available_rels list but I couldn't come up with an example where
> that's possible, what do you think?
>
Considering it again, I think we can't face problems like that because
we don't work with join.

I attached a diff file as a difference with the 3rd version of the
patch, when we did not consider the values with var for transformation.

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
consider_vars.diff.no-cfbot text/plain 2.5 KB
v5-0002-Add-an-implementation-of-the-x-IN-VALUES-to-x-ANY.patch text/x-patch 31.7 KB
v5-0001-Move-the-function-for-generating-ArrayExpr-to-anothe.patch text/x-patch 9.7 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-06 08:23:29
Message-ID: CAPpHfduU5Dy08r11EV3myOSf2Um8auN-PrpCrX84AeQST1vRQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alena!

On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> On 09.02.2025 18:38, Alexander Korotkov wrote:
>>
>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>> For instance, I believe cases like this (containing Var) could be transformed too.
>>
>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
>
>
> I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.
>
> I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.
>
> I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?
>
> Considering it again, I think we can't face problems like that because we don't work with join.
>
> I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.

I take detailed look at makeSAOPArrayExpr() function, which is much
more complex than corresponding fragment from
match_orclause_to_indexcol(). And I found it to be mostly wrong. We
are working in post parse-analyze stage. That means it's too late to
do type coercion or lookup operator by name. We have already all the
catalog objects nailed down. In connection with that, second argument
of OpExpr shouldn't be ignored as it might contain amrelevant type
cast. I think I've fixed the most of them problems in the attached
patchset.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v6-0001-Move-the-function-for-generating-ArrayExpr-to-ano.patch application/octet-stream 7.0 KB
v6-0002-Add-an-implementation-of-the-x-IN-VALUES-to-x-ANY.patch application/octet-stream 32.6 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-12 18:11:24
Message-ID: 56acf450-f5e1-4abf-96d5-47df14cb9362@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alexander!

On 06.03.2025 11:23, Alexander Korotkov wrote:
> Hi, Alena!
>
> On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> On 09.02.2025 18:38, Alexander Korotkov wrote:
>>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>>> For instance, I believe cases like this (containing Var) could be transformed too.
>>>
>>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
>> I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.
>>
>> I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.
>>
>> I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?
>>
>> Considering it again, I think we can't face problems like that because we don't work with join.
>>
>> I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.
> I take detailed look at makeSAOPArrayExpr() function, which is much
> more complex than corresponding fragment from
> match_orclause_to_indexcol(). And I found it to be mostly wrong. We
> are working in post parse-analyze stage. That means it's too late to
> do type coercion or lookup operator by name. We have already all the
> catalog objects nailed down. In connection with that, second argument
> of OpExpr shouldn't be ignored as it might contain amrelevant type
> cast. I think I've fixed the most of them problems in the attached
> patchset.
>
>
I agree with your conclusion and changes.

--
Regards,
Alena Rybakina
Postgres Professional


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-29 11:03:07
Message-ID: CAPpHfdtO_vmpBtARWBsu6oXJKZMwy0YXjPbwqNqK=vOQ6qNG5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 12, 2025 at 8:11 PM Alena Rybakina
<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> On 06.03.2025 11:23, Alexander Korotkov wrote:
>
> Hi, Alena!
>
> On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> On 09.02.2025 18:38, Alexander Korotkov wrote:
>
> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
> For instance, I believe cases like this (containing Var) could be transformed too.
>
> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
>
> I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.
>
> I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.
>
> I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?
>
> Considering it again, I think we can't face problems like that because we don't work with join.
>
> I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.
>
> I take detailed look at makeSAOPArrayExpr() function, which is much
> more complex than corresponding fragment from
> match_orclause_to_indexcol(). And I found it to be mostly wrong. We
> are working in post parse-analyze stage. That means it's too late to
> do type coercion or lookup operator by name. We have already all the
> catalog objects nailed down. In connection with that, second argument
> of OpExpr shouldn't be ignored as it might contain amrelevant type
> cast. I think I've fixed the most of them problems in the attached
> patchset.
>
>
> I agree with your conclusion and changes.

I've revised the patchset. Mostly comments/commit messages and minor
refactoring. One thing I have to fix: we must do
IncrementVarSublevelsUp() unconditionally for all expressions as Vars
could be deeper inside. Also, I've removed our subquery check
completely. Not sure if we need it at all. I'll further analyze
that.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v7-0001-Extract-make_SAOP_expr-function-from-match_orclau.patch application/octet-stream 7.3 KB
v7-0002-Convert-x-IN-VALUES-.-to-x-ANY-.-then-appropriate.patch application/octet-stream 32.0 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-29 19:03:00
Message-ID: dc727ff3-10d8-4794-a60b-168c428d3b15@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On 29.03.2025 14:03, Alexander Korotkov wrote:
> On Wed, Mar 12, 2025 at 8:11 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> On 06.03.2025 11:23, Alexander Korotkov wrote:
>>
>> Hi, Alena!
>>
>> On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>
>> On 09.02.2025 18:38, Alexander Korotkov wrote:
>>
>> Also, aren't we too restrictive while requiring is_simple_values_sequence()?
>> For instance, I believe cases like this (containing Var) could be transformed too.
>>
>> select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
>>
>> I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.
>>
>> I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.
>>
>> I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?
>>
>> Considering it again, I think we can't face problems like that because we don't work with join.
>>
>> I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.
>>
>> I take detailed look at makeSAOPArrayExpr() function, which is much
>> more complex than corresponding fragment from
>> match_orclause_to_indexcol(). And I found it to be mostly wrong. We
>> are working in post parse-analyze stage. That means it's too late to
>> do type coercion or lookup operator by name. We have already all the
>> catalog objects nailed down. In connection with that, second argument
>> of OpExpr shouldn't be ignored as it might contain amrelevant type
>> cast. I think I've fixed the most of them problems in the attached
>> patchset.
>>
>>
>> I agree with your conclusion and changes.
> I've revised the patchset. Mostly comments/commit messages and minor
> refactoring. Also, I've removed our subquery check
> completely. Not sure if we need it at all. I'll further analyze
> that.

I agree with your changes, the code really started to look better and
more understandable. Thank you!

As for function - it checked that values didn't contain any subquery
elements (if they consists RangeTblEntry type variables) and when you
removed it you caused the problem with sublevel parameter.

> One thing I have to fix: we must do
> IncrementVarSublevelsUp() unconditionally for all expressions as Vars
> could be deeper inside.

Yes, I'm looking at it too, I've just understood that it was needed for
subqueries - they can contain var elements which needs decrease the
sublevel parameter.

for example for the query:

EXPLAIN (COSTS OFF)
SELECT ten FROM onek t
WHERE unique1 IN (VALUES (0), ((2 IN (SELECT unique2 FROM onek c
  WHERE c.unique2 = t.unique1))::integer));

We are interested in this element: ((2 IN (SELECT unique2 FROM onek c 
WHERE c.unique2 = *t.unique1*))

It is funcexpr object with RabgeTblEntry variable. I highlighted

WARNING:  1{FUNCEXPR :funcid 2558 :funcresulttype 23 :funcretset false
:funcvariadic false :funcformat 1 :funccollid 0 :inputcollid 0 :args
({SUBLINK :subLinkType 2 :subLinkId 0 :testexpr {OPEXPR :opno 96
:opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0
:args ({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
:constbyval true :constisnull false :location -1 :constvalue 4 [ 2 0 0 0
0 0 0 0 ]} {PARAM :paramkind 2 :paramid 1 :paramtype 23 :paramtypmod -1
:paramcollid 0 :location -1}) :location -1} :operName ("=") :subselect
{QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt <>
:resultRelation 0 :hasAggs false :hasWindowFuncs false :hasTargetSRFs
false :hasSubLinks false :hasDistinctOn false :hasRecursive false
:hasModifyingCTE false :hasForUpdate false :hasRowSecurity false
:hasGroupRTE false :isReturn false :cteList <> :rtable ({RANGETBLENTRY
:alias {ALIAS :aliasname c :colnames <>} :eref {ALIAS :aliasname c
:colnames ("unique1" "unique2" "two" "four" "ten" "twenty" "hundred"
"thousand" "twothousand" "fivethous" "tenthous" "odd" "even" "stringu1"
"stringu2" "string4")} :rtekind 0 :relid 32795 :inh true :relkind r
:rellockmode 1 :perminfoindex 1 :tablesample <> :lateral false :inFromCl
true :securityQuals <>}) :rteperminfos ({RTEPERMISSIONINFO :relid 32795
:inh true :requiredPerms 2 :checkAsUser 0 :selectedCols (b 9)
:insertedCols (b) :updatedCols (b)}) :jointree {FROMEXPR :fromlist
({RANGETBLREF :rtindex 1}) :quals {OPEXPR :opno 96 :opfuncid 65
:opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR
:varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0
:varnullingrels (b) :varlevelsup 0 :varreturningtype 0 :varnosyn 1
:varattnosyn 2 :location -1} *{VAR :varno 1 :varattno 1 :vartype 23
:vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 2
:varreturningtype 0 :varnosyn 1 :varattnosyn 1 :location -1}) :location
-1}*} :mergeActionList <> :mergeTargetRelation 0 :mergeJoinCondition <>
:targetList ({TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 23
:vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 0
:varreturningtype 0 :varnosyn 1 :varattnosyn 2 :location -1} :resno 1
:resname unique2 :ressortgroupref 0 :resorigtbl 32795 :resorigcol 2
:resjunk false}) :override 0 :onConflict <> :returningOldAlias <>
:returningNewAlias <> :returningList <> :groupClause <> :groupDistinct
false :groupingSets <> :havingQual <> :windowClause <> :distinctClause
<> :sortClause <> :limitOffset <> :limitCount <> :limitOption 0
:rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <>
:stmt_location -1 :stmt_len -1} :location -1}) :location -1}

I highlighted in bold the var we need - since it is in a subquery in the
in expression will be flattened, all elements contained in it should
decrease the level number by one, since they will belong to the subtree
located above it. Because of that condition, this did not happen.

I generally agree with you that it is better to remove that condition.
The function IncrementVarSublevelsUp essentially goes through the
structures below and will decrease the level of only the vars for which
this needs to be done, and the condition with 1 will protect us from
touching those vars that should not. So the varlevelsup for this var
should be 1.

I am currently investigating whether this transformation will be fair
for all cases; I have not found any problems yet.

--
Regards,
Alena Rybakina
Postgres Professional


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-29 21:59:01
Message-ID: CAPpHfduXsnWzz7zOBcadug3iYMUN3HaQqw7uz7vo6EMdjPqAMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alena!

On Sat, Mar 29, 2025 at 9:03 PM Alena Rybakina
<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> On 29.03.2025 14:03, Alexander Korotkov wrote:
>> One thing I have to fix: we must do
>> IncrementVarSublevelsUp() unconditionally for all expressions as Vars
>> could be deeper inside.
>
> Yes, I'm looking at it too, I've just understood that it was needed for subqueries - they can contain var elements which needs decrease the sublevel parameter.
>
> for example for the query:
>
> EXPLAIN (COSTS OFF)
> SELECT ten FROM onek t
> WHERE unique1 IN (VALUES (0), ((2 IN (SELECT unique2 FROM onek c
> WHERE c.unique2 = t.unique1))::integer));
>
> We are interested in this element: ((2 IN (SELECT unique2 FROM onek c WHERE c.unique2 = t.unique1))
>
> It is funcexpr object with RabgeTblEntry variable. I highlighted
>
> WARNING: 1{FUNCEXPR :funcid 2558 :funcresulttype 23 :funcretset false :funcvariadic false :funcformat 1 :funccollid 0 :inputcollid 0 :args ({SUBLINK :subLinkType 2 :subLinkId 0 :testexpr {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 2 0 0 0 0 0 0 0 ]} {PARAM :paramkind 2 :paramid 1 :paramtype 23 :paramtypmod -1 :paramcollid 0 :location -1}) :location -1} :operName ("=") :subselect {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt <> :resultRelation 0 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :hasGroupRTE false :isReturn false :cteList <> :rtable ({RANGETBLENTRY :alias {ALIAS :aliasname c :colnames <>} :eref {ALIAS :aliasname c :colnames ("unique1" "unique2" "two" "four" "ten" "twenty" "hundred" "thousand" "twothousand" "fivethous" "tenthous" "odd" "even" "stringu1" "stringu2" "string4")} :rtekind 0 :relid 32795 :inh true :relkind r :rellockmode 1 :perminfoindex 1 :tablesample <> :lateral false :inFromCl true :securityQuals <>}) :rteperminfos ({RTEPERMISSIONINFO :relid 32795 :inh true :requiredPerms 2 :checkAsUser 0 :selectedCols (b 9) :insertedCols (b) :updatedCols (b)}) :jointree {FROMEXPR :fromlist ({RANGETBLREF :rtindex 1}) :quals {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 0 :varreturningtype 0 :varnosyn 1 :varattnosyn 2 :location -1} {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 2 :varreturningtype 0 :varnosyn 1 :varattnosyn 1 :location -1}) :location -1}} :mergeActionList <> :mergeTargetRelation 0 :mergeJoinCondition <> :targetList ({TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 0 :varreturningtype 0 :varnosyn 1 :varattnosyn 2 :location -1} :resno 1 :resname unique2 :ressortgroupref 0 :resorigtbl 32795 :resorigcol 2 :resjunk false}) :override 0 :onConflict <> :returningOldAlias <> :returningNewAlias <> :returningList <> :groupClause <> :groupDistinct false :groupingSets <> :havingQual <> :windowClause <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :limitOption 0 :rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <> :stmt_location -1 :stmt_len -1} :location -1}) :location -1}
>
>
> I highlighted in bold the var we need - since it is in a subquery in the in expression will be flattened, all elements contained in it should decrease the level number by one, since they will belong to the subtree located above it. Because of that condition, this did not happen.
>
> I generally agree with you that it is better to remove that condition. The function IncrementVarSublevelsUp essentially goes through the structures below and will decrease the level of only the vars for which this needs to be done, and the condition with 1 will protect us from touching those vars that should not. So the varlevelsup for this var should be 1.
>
> I am currently investigating whether this transformation will be fair for all cases; I have not found any problems yet.

Thank you for your feedback. I appreciate you're also looking for the
potential problems. On thing to highlight: doing
IncrementVarSublevelsUp() unconditionally is required not just for
subqueries. Consider the following example.

SELECT * FROM t WHERE val1 IN (VALUES (val2), (val2 +1));

The second value contain Var, which needs IncrementVarSublevelsUp(),
but the top node is OpExpr.

------
Regards,
Alexander Korotkov
Supabase


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-03-31 23:10:58
Message-ID: 3ee48e90-7335-4b22-b7ce-bb0fea2c5779@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alexander!

On 30.03.2025 00:59, Alexander Korotkov wrote:
> Hi, Alena!
>
> On Sat, Mar 29, 2025 at 9:03 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> On 29.03.2025 14:03, Alexander Korotkov wrote:
>>> One thing I have to fix: we must do
>>> IncrementVarSublevelsUp() unconditionally for all expressions as Vars
>>> could be deeper inside.
>> Yes, I'm looking at it too, I've just understood that it was needed for subqueries - they can contain var elements which needs decrease the sublevel parameter.
>>
>> for example for the query:
>>
>> EXPLAIN (COSTS OFF)
>> SELECT ten FROM onek t
>> WHERE unique1 IN (VALUES (0), ((2 IN (SELECT unique2 FROM onek c
>> WHERE c.unique2 = t.unique1))::integer));
>>
>> We are interested in this element: ((2 IN (SELECT unique2 FROM onek c WHERE c.unique2 = t.unique1))
>>
>> It is funcexpr object with RabgeTblEntry variable. I highlighted
>>
>> WARNING: 1{FUNCEXPR :funcid 2558 :funcresulttype 23 :funcretset false :funcvariadic false :funcformat 1 :funccollid 0 :inputcollid 0 :args ({SUBLINK :subLinkType 2 :subLinkId 0 :testexpr {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 2 0 0 0 0 0 0 0 ]} {PARAM :paramkind 2 :paramid 1 :paramtype 23 :paramtypmod -1 :paramcollid 0 :location -1}) :location -1} :operName ("=") :subselect {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt <> :resultRelation 0 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :hasGroupRTE false :isReturn false :cteList <> :rtable ({RANGETBLENTRY :alias {ALIAS :aliasname c :colnames <>} :eref {ALIAS :aliasname c :colnames ("unique1" "unique2" "two" "four" "ten" "twenty" "hundred" "thousand" "twothousand" "fivethous" "tenthous" "odd" "even" "stringu1" "stringu2" "string4")} :rtekind 0 :relid 32795 :inh true :relkind r :rellockmode 1 :perminfoindex 1 :tablesample <> :lateral false :inFromCl true :securityQuals <>}) :rteperminfos ({RTEPERMISSIONINFO :relid 32795 :inh true :requiredPerms 2 :checkAsUser 0 :selectedCols (b 9) :insertedCols (b) :updatedCols (b)}) :jointree {FROMEXPR :fromlist ({RANGETBLREF :rtindex 1}) :quals {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 0 :varreturningtype 0 :varnosyn 1 :varattnosyn 2 :location -1} {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 2 :varreturningtype 0 :varnosyn 1 :varattnosyn 1 :location -1}) :location -1}} :mergeActionList <> :mergeTargetRelation 0 :mergeJoinCondition <> :targetList ({TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varnullingrels (b) :varlevelsup 0 :varreturningtype 0 :varnosyn 1 :varattnosyn 2 :location -1} :resno 1 :resname unique2 :ressortgroupref 0 :resorigtbl 32795 :resorigcol 2 :resjunk false}) :override 0 :onConflict <> :returningOldAlias <> :returningNewAlias <> :returningList <> :groupClause <> :groupDistinct false :groupingSets <> :havingQual <> :windowClause <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :limitOption 0 :rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <> :stmt_location -1 :stmt_len -1} :location -1}) :location -1}
>>
>>
>> I highlighted in bold the var we need - since it is in a subquery in the in expression will be flattened, all elements contained in it should decrease the level number by one, since they will belong to the subtree located above it. Because of that condition, this did not happen.
>>
>> I generally agree with you that it is better to remove that condition. The function IncrementVarSublevelsUp essentially goes through the structures below and will decrease the level of only the vars for which this needs to be done, and the condition with 1 will protect us from touching those vars that should not. So the varlevelsup for this var should be 1.
>>
>> I am currently investigating whether this transformation will be fair for all cases; I have not found any problems yet.
> Thank you for your feedback. I appreciate you're also looking for the
> potential problems. On thing to highlight: doing
> IncrementVarSublevelsUp() unconditionally is required not just for
> subqueries. Consider the following example.
>
> SELECT * FROM t WHERE val1 IN (VALUES (val2), (val2 +1));
>
> The second value contain Var, which needs IncrementVarSublevelsUp(),
> but the top node is OpExpr.
Yes, I agree with that - this is precisely why we need to call
IncrementVarSublevelsUp() unconditionally for all types.

As you mentioned earlier, Var nodes can be nested more deeply, and
skipping this step could lead to incorrect behavior in those cases. So,
now it works fine)

Thank you for an example.

I analyzed this transformation with various types of values that might
be used in conditions.

First, I verified whether the change would affect semantics, especially
in the presence of NULL elements. The only notable behavior I observed was
the coercion of NULL to an integer type. However, this behavior remains
the same even without our transformation, so everything is fine.

To test this, I created a onek table containing NULL values:

CREATE TABLE onek ( unique1 INT, unique2 INT, ten TEXT );

INSERT INTO onek VALUES (0, 10, 'zero'), (1, NULL, 'one'), (2, 2,
'two'), (3, NULL, 'three'), (4, NULL, 'only null match'), (5, NULL, 'two
+ null match'), (6, NULL, 'no match');

1.1) explain analyze SELECT ten FROM onek t WHERE unique1 IN (VALUES
(0), (NULL));

QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..25.00 rows=6 width=32) (actual
time=0.274..0.282 rows=1.00 loops=1) Filter: (unique1 = ANY
('{0,NULL}'::integer[])) Rows Removed by Filter: 6 Buffers: shared
read=1 Planning: Buffers: shared hit=52 read=23 Planning Time: 2.124 ms
Execution Time: 0.374 ms (8 rows)

The query plan without our patch:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.05..25.33 rows=12 width=32) (actual
time=0.064..0.081 rows=1.00 loops=1) Hash Cond: (t.unique1 =
"*VALUES*".column1) Buffers: shared hit=1 -> Seq Scan on onek t
(cost=0.00..22.00 rows=1200 width=36) (actual time=0.028..0.034
rows=7.00 loops=1) Buffers: shared hit=1 -> Hash (cost=0.03..0.03 rows=2
width=4) (actual time=0.018..0.020 rows=1.00 loops=1) Buckets: 1024
Batches: 1 Memory Usage: 9kB -> Values Scan on "*VALUES*"
(cost=0.00..0.03 rows=2 width=4) (actual time=0.004..0.008 rows=2.00
loops=1) Planning: Buffers: shared hit=8 Planning Time: 0.513 ms
Execution Time: 0.182 ms (12 rows)

I added another tuple with a NULL value in the unique1 column to verify
that the semantics remain correct when comparing NULL with NULL.

I didn't observe any issues, as the behavior was identical to how it
worked before applying the patch.

insert into onek values (NULL, 1, 'match');

1.2) explain analyze SELECT ten FROM onek t WHERE unique1 IN (VALUES
(0), (NULL));

QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..25.00 rows=6 width=32) (actual
time=0.053..0.063 rows=1.00 loops=1) Filter: (unique1 = ANY
('{0,NULL}'::integer[])) Rows Removed by Filter: 7 Buffers: shared hit=1
Planning Time: 0.178 ms Execution Time: 0.109 ms (6 rows)

The query plan without our patch:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.05..25.33 rows=12 width=32) (actual
time=0.076..0.090 rows=1.00 loops=1) Hash Cond: (t.unique1 =
"*VALUES*".column1) Buffers: shared hit=1 -> Seq Scan on onek t
(cost=0.00..22.00 rows=1200 width=36) (actual time=0.043..0.048
rows=8.00 loops=1) Buffers: shared hit=1 -> Hash (cost=0.03..0.03 rows=2
width=4) (actual time=0.017..0.018 rows=1.00 loops=1) Buckets: 1024
Batches: 1 Memory Usage: 9kB -> Values Scan on "*VALUES*"
(cost=0.00..0.03 rows=2 width=4) (actual time=0.005..0.008 rows=2.00
loops=1) Planning Time: 0.312 ms Execution Time: 0.174 ms (10 rows)

Since the subquery became correlated with our transformation when it
included var objects, I also checked whether our transformation had any
negative impact on query performance.

To do this, I added unique values to the table, forcing the subquery to
be re-executed for each outer tuple. I observed a little performance
degradation (see the number of shared hit in a query 2.1) and
the worst scenario involving nested VALUES clauses, where the
performance impact is substantial (a query 2.2).

As for the 2.2 query, without our transformation, the query requires
scanning significantly fewer blocks - roughly half as many (see the
number of shared hit).

insert into onek select id, id, 'match' from generate_series(1,10000) id;

2.1) explain analyze SELECT ten FROM onek t WHERE unique1 IN ( VALUES
(0), ((2 IN ( SELECT unique2 FROM onek c WHERE c.unique2 = t.unique1
))::integer) );

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..901463.05 rows=51 width=6) (actual
time=5.703..15131.084 rows=1.00 loops=1) Filter: (unique1 = ANY
(ARRAY[0, ((ANY (2 = (SubPlan 1).col1)))::integer])) Rows Removed by
Filter: 10007 *Buffers: shared hit=550389* SubPlan 1 **-> Seq Scan on
onek c (cost=0.00..180.10 rows=1 width=4) (actual time=0.818..1.511
rows=1.00 loops=10008) Filter: (unique2 = t.unique1) Rows Removed by
Filter: 10005 *Buffers: shared hit=550334* Planning Time: 0.279 ms
Execution Time: 15131.148 ms (11 rows)

The query plan without our patch:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join (cost=0.00..385.00 rows=66 width=32) (actual
time=0.099..19935.638 rows=1.00 loops=1) Buffers: shared hit=550334 ->
Seq Scan on onek t (cost=0.00..121.00 rows=6600 width=36) (actual
time=0.082..1.723 rows=10008.00 loops=1) Buffers: shared hit=55 ->
Values Scan on "*VALUES*" (cost=0.00..0.03 rows=1 width=4) (actual
time=1.991..1.991 rows=0.00 loops=10008) Filter: (t.unique1 = column1)
Rows Removed by Filter: 2 *Buffers: shared hit=550279* SubPlan 1 -> Seq
Scan on onek c (cost=0.00..137.50 rows=33 width=4) (actual
time=1.106..1.989 rows=1.00 loops=10007) Filter: (unique2 = t.unique1)
Rows Removed by Filter: 10005 *Buffers: shared hit=550279* Planning
Time: 0.393 ms Execution Time: 19935.710 ms (15 rows)

2.2) explain analyze SELECT ten FROM onek t WHERE unique1 IN ( VALUES
(0), ((2 IN ( SELECT unique2 FROM onek c WHERE c.unique2 in
((values(0),((2 in (select unique2 from onek c1 where c1.unique2 =
t.unique1))::integer))) ))::integer) );

--------------------------------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..2954341.54 rows=51 width=6) (actual
time=8.095..93017.686 rows=1.00 loops=1) Filter: (unique1 = ANY
(ARRAY[0, ((ANY (2 = (SubPlan 2).col1)))::integer])) Rows Removed by
Filter: 10007 Buffers: shared hit=1100935 SubPlan 2 -> Seq Scan on onek
c (cost=180.10..410.24 rows=2 width=4) (actual time=9.290..9.292
rows=0.00 loops=10008) Filter: (unique2 = ANY (ARRAY[0, ((ANY (2 =
(hashed SubPlan 1).col1)))::integer])) Rows Removed by Filter: 10008
*Buffers: shared hit=1100880* SubPlan 1 -> Seq Scan on onek c1
(cost=0.00..180.10 rows=1 width=4) (actual time=1.183..2.291 rows=1.00
loops=10008) Filter: (unique2 = t.unique1) Rows Removed by Filter: 10007
Buffers: shared hit=550440 Planning: Buffers: shared hit=19 Planning
Time: 0.733 ms *Execution Time: 93017.795 ms* (18 rows)

The query plan without our patch:

--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join (cost=0.00..555.40 rows=2 width=6) (actual
time=0.046..50110.402 rows=1.00 loops=1) Buffers: shared hit=1100825 ->
Seq Scan on onek t (cost=0.00..155.08 rows=10008 width=10) (actual
time=0.028..1.374 rows=10008.00 loops=1) Buffers: shared hit=55 ->
Values Scan on "*VALUES*" (cost=0.00..0.03 rows=1 width=4) (actual
time=5.006..5.006 rows=0.00 loops=10008) Filter: (t.unique1 = column1)
Rows Removed by Filter: 2 Buffers: shared hit=1100770 SubPlan 2 -> Hash
Semi Join (cost=0.05..181.42 rows=2 width=4) (actual time=5.003..5.003
rows=0.00 loops=10007) Hash Cond: (c.unique2 = "*VALUES*_1".column1)
Buffers: shared hit=1100770 -> Seq Scan on onek c (cost=0.00..155.08
rows=10008 width=4) (actual time=0.004..1.165 rows=10008.00 loops=10007)
*Buffers: shared hit=550385* -> Hash (cost=0.03..0.03 rows=2 width=4)
(actual time=1.921..1.921 rows=2.00 loops=10007) Buckets: 1024 Batches:
1 Memory Usage: 9kB Buffers: shared hit=550385 -> Values Scan on
"*VALUES*_1" (cost=0.00..0.03 rows=2 width=4) (actual time=0.000..1.920
rows=2.00 loops=10007) Buffers: shared hit=550385 SubPlan 1 -> Seq Scan
on onek c1 (cost=0.00..180.10 rows=1 width=4) (actual time=1.010..1.917
rows=1.00 loops=10007) Filter: (unique2 = t.unique1) Rows Removed by
Filter: 10007 Buffers: shared hit=550385 Planning: Buffers: shared hit=6
Planning Time: 0.874 ms *Execution Time: 50110.531 ms* (28 rows)

If we build an index, the number of scanned blocks remains the same or
even decreases, so I don't observe any performance degradation in that case.

Does this mean that we should consider applying this transformation
later, perhaps where the OR->ANY transformation is performed, at least
for cases where VALUES clauses contain subqueries or Var nodes?

create index on onek (unique2);

3.1)explain analyze SELECT ten FROM onek t WHERE unique1 IN ( VALUES
(0), ((2 IN ( SELECT unique2 FROM onek c WHERE c.unique2 = t.unique1
))::integer) );

---------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..23198.50 rows=51 width=6) (actual
time=0.142..60.369 rows=1.00 loops=1) Filter: (unique1 = ANY (ARRAY[0,
((ANY (2 = (SubPlan 1).col1)))::integer])) Rows Removed by Filter: 10007
*Buffers: shared hit=20070* SubPlan 1 -> Index Only Scan using
onek_unique2_idx on onek c (cost=0.29..4.30 rows=1 width=4) (actual
time=0.004..0.005 rows=1.00 loops=10008) Index Cond: (unique2 =
t.unique1) Heap Fetches: 0 *Index Searches: 10007* *Buffers: shared
hit=20015* Planning: Buffers: shared hit=121 Planning Time: 2.426 ms
Execution Time: 60.512 ms (14 rows)

The query plan without our patch:

---------------------- Nested Loop Semi Join (cost=0.00..555.40 rows=2
width=6) (actual time=0.118..59.554 rows=1.00 loops=1) *Buffers: shared
hit=19983 read=85* -> Seq Scan on onek t (cost=0.00..155.08 rows=10008
width=10) (actual time=0.090..1.834 rows=10008.00 loops=1) *Buffers:
shared read=55* -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=1
width=4) (actual time=0.005..0.005 rows=0.00 loops=10008) Filter:
(t.unique1 = column1) Rows Removed by Filter: 2 *Buffers: shared
hit=19983 read=30* SubPlan 1 -> Index Only Scan using onek_unique2_idx
on onek c (cost=0.29..4.30 rows=1 width=4) (actual time=0.004..0.004
rows=1.00 loops=10007) Index Cond: (unique2 = t.unique1) Heap Fetches: 0
*Index Searches: 10006* *Buffers: shared hit=19983 read=30* Planning:
Buffers: shared hit=120 read=24 Planning Time: 3.731 ms Execution Time:
59.644 ms (18 rows)

3.2) explain analyze SELECT ten FROM onek t WHERE unique1 IN ( VALUES
(0), ((2 IN ( SELECT unique2 FROM onek c WHERE c.unique2 in
((values(0),((2 in (select unique2 from onek c1 where c1.unique2 =
t.unique1))::integer))) ))::integer) );

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join (cost=0.00..555.40 rows=2 width=6) (actual
time=0.055..131.421 rows=1.00 loops=1) *Buffers: shared hit=40090* ->
Seq Scan on onek t (cost=0.00..155.08 rows=10008 width=10) (actual
time=0.038..1.775 rows=10008.00 loops=1) *Buffers: shared hit=55* ->
Values Scan on "*VALUES*" (cost=0.00..0.03 rows=1 width=4) (actual
time=0.013..0.013 rows=0.00 loops=10008) Filter: (t.unique1 = column1)
Rows Removed by Filter: 2 *Buffers: shared hit=40035* SubPlan 2 ->
Nested Loop (cost=0.32..8.67 rows=2 width=4) (actual time=0.012..0.012
rows=0.00 loops=10007) *Buffers: shared hit=40035* -> Unique
(cost=0.04..0.04 rows=2 width=4) (actual time=0.008..0.008 rows=1.00
loops=10007) *Buffers: shared hit=20016* -> Sort (cost=0.04..0.04 rows=2
width=4) (actual time=0.007..0.007 rows=2.00 loops=10007) Sort Key:
"*VALUES*_1".column1 Sort Method: quicksort Memory: 25kB *Buffers:
shared hit=20016* -> Values Scan on "*VALUES*_1" (cost=0.00..0.03 rows=2
width=4) (actual time=0.000..0.005 rows=2.00 loops=10007) *Buffers:
shared hit=20013* SubPlan 1 -> Index Only Scan using onek_unique2_idx on
onek c1 (cost=0.29..4.30 rows=1 width=4) (actual time=0.004..0.004
rows=1.00 loops=10007) Index Cond: (unique2 = t.unique1) Heap Fetches: 0
*Index Searches: 10006* *Buffers: shared hit=20013* -> Index Only Scan
using onek_unique2_idx on onek c (cost=0.29..4.30 rows=1 width=4)
(actual time=0.003..0.003 rows=0.00 loops=10009) Index Cond: (unique2 =
"*VALUES*_1".column1) Heap Fetches: 0 *Index Searches: 10009* *Buffers:
shared hit=20019* Planning: Buffers: shared hit=10 Planning Time: 1.183
ms Execution Time: 131.616 ms (34 rows)

The query plan without our patch:

------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..87875.20 rows=51 width=6) (actual
time=0.217..174.053 rows=1.00 loops=1) Filter: (unique1 = ANY (ARRAY[0,
((ANY (2 = (SubPlan 2).col1)))::integer])) Rows Removed by Filter: 10007
*Buffers: shared hit=40002 read=85* SubPlan 2 -> Index Only Scan using
onek_unique2_idx on onek c (cost=4.60..12.92 rows=2 width=4) (actual
time=0.007..0.007 rows=0.00 loops=10008) Index Cond: (unique2 = ANY
(ARRAY[0, ((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Heap
Fetches: 0 *Index Searches: 10008* *Buffers: shared hit=20017* SubPlan 1
-> Index Only Scan using onek_unique2_idx on onek c1 (cost=0.29..4.30
rows=1 width=4) (actual time=0.006..0.006 rows=1.00 loops=10008) Index
Cond: (unique2 = t.unique1) Heap Fetches: 0 *Index Searches: 10007*
*Buffers: shared hit=19985 read=30* Planning: Buffers: shared hit=91
read=25 Planning Time: 2.858 ms Execution Time: 174.233 ms (20 rows)

I don't observe any performance degradation if VALUES contains constants.

4.1) explain analyze SELECT ten FROM onek t WHERE unique1 IN ( VALUES
(0), ((2 IN ( SELECT unique2 FROM onek c WHERE c.unique2 in
((values(0),(2))))::integer)) );

QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=180.11..410.25 rows=2 width=6) (actual
time=5.014..13.256 rows=3.00 loops=1) Filter: (unique1 = ANY (ARRAY[0,
((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Rows Removed by
Filter: 10005 *Buffers: shared hit=110* SubPlan 1 -> Seq Scan on onek c
(cost=0.00..180.10 rows=3 width=4) (actual time=0.022..4.951 rows=2.00
loops=1) Filter: (unique2 = ANY ('{0,2}'::integer[])) Rows Removed by
Filter: 10006 *Buffers: shared hit=55* Planning: Buffers: shared hit=6
dirtied=1 Planning Time: 0.502 ms Execution Time: 13.348 ms (13 rows)

The query plan without our patch:

--------------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual
time=5.072..9.076 rows=3.00 loops=1) Hash Cond: (t.unique1 =
"*VALUES*".column1) *Buffers: shared hit=55 read=55* -> Seq Scan on onek
t (cost=0.00..155.08 rows=10008 width=10) (actual time=0.145..1.802
rows=10008.00 loops=1) *Buffers: shared hit=52 read=3* -> Hash
(cost=0.03..0.03 rows=2 width=4) (actual time=4.908..4.912 rows=2.00
loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB *Buffers: shared
hit=3 read=52* -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=4) (actual time=0.003..4.901 rows=2.00 loops=1) *Buffers: shared
hit=3 read=52* SubPlan 1 -> Hash Semi Join (cost=0.05..181.42 rows=2
width=4) (actual time=0.036..4.861 rows=2.00 loops=1) Hash Cond:
(c.unique2 = "*VALUES*_1".column1) *Buffers: shared hit=3 read=52* ->
Seq Scan on onek c (cost=0.00..155.08 rows=10008 width=4) (actual
time=0.009..2.120 rows=10008.00 loops=1) *Buffers: shared hit=3 read=52*
-> Hash (cost=0.03..0.03 rows=2 width=4) (actual time=0.006..0.008
rows=2.00 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Values
Scan on "*VALUES*_1" (cost=0.00..0.03 rows=2 width=4) (actual
time=0.001..0.002 rows=2.00 loops=1) Planning: Buffers: shared hit=102
read=22 Planning Time: 1.853 ms Execution Time: 9.281 ms (23 rows)

4.2) explain analyze SELECT ten FROM onek t WHERE unique1 IN ( VALUES
(0),(2) );

QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on onek t (cost=0.00..180.10 rows=3 width=6) (actual
time=0.200..3.777 rows=3.00 loops=1) Filter: (unique1 = ANY
('{0,2}'::integer[])) Rows Removed by Filter: 10005 *Buffers: shared
read=55* Planning: Buffers: shared hit=65 read=26 Planning Time: 1.345
ms Execution Time: 3.826 ms (8 rows)

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual
time=0.094..4.935 rows=3.00 loops=1) Hash Cond: (t.unique1 =
"*VALUES*".column1) *Buffers: shared hit=55* -> Seq Scan on onek t
(cost=0.00..155.08 rows=10008 width=10) (actual time=0.056..2.052
rows=10008.00 loops=1) *Buffers: shared hit=55* -> Hash (cost=0.03..0.03
rows=2 width=4) (actual time=0.019..0.019 rows=2.00 loops=1) Buckets:
1024 Batches: 1 Memory Usage: 9kB -> Values Scan on "*VALUES*"
(cost=0.00..0.03 rows=2 width=4) (actual time=0.005..0.009 rows=2.00
loops=1) Planning Time: 0.332 ms Execution Time: 4.998 ms (10 rows)

--
Regards,
Alena Rybakina
Postgres Professional


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-01 04:14:02
Message-ID: CAPpHfduryUMSGzBXBYhiGpWvn9wMvTwpbAtxgAnv1z+cYCZdUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alena!

On Tue, Apr 1, 2025 at 2:11 AM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
wrote:
> Yes, I agree with that - this is precisely why we need to call
IncrementVarSublevelsUp() unconditionally for all types.
>
> As you mentioned earlier, Var nodes can be nested more deeply, and
skipping this step could lead to incorrect behavior in those cases. So, now
it works fine)
>
> Thank you for an example.
>
> I analyzed this transformation with various types of values that might be
used in conditions.
>
> First, I verified whether the change would affect semantics, especially
in the presence of NULL elements. The only notable behavior I observed was
> the coercion of NULL to an integer type. However, this behavior remains
the same even without our transformation, so everything is fine.

Thank you for your experiments! I've also rechecked we don't sacrifice
lazy evaluation. But it appears we don't have one anyway.

CREATE FUNCTION my_func() RETURNS text AS $$
BEGIN
RAISE NOTICE 'notice';
RETURN 'b';
END;
$$ LANGUAGE 'plpgsql';

# create table test (val text);
# insert into test values ('a');
# explain analyze select * from test where val in (VALUES ('a'),
(my_func()));
NOTICE: notice
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.05..21.26 rows=9 width=64) (actual
time=0.178..0.183 rows=1.00 loops=1)
Hash Cond: (test.val = ("*VALUES*".column1)::text)
Buffers: shared hit=1
-> Seq Scan on test (cost=0.00..18.80 rows=880 width=64) (actual
time=0.045..0.048 rows=1.00 loops=1)
Buffers: shared hit=1
-> Hash (cost=0.03..0.03 rows=2 width=32) (actual time=0.111..0.112
rows=2.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32)
(actual time=0.004..0.065 rows=2.00 loops=1)
Planning Time: 0.250 ms
Execution Time: 0.267 ms
(10 rows)

------
Regards,
Alexander Korotkov
Supabase


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-01 12:07:59
Message-ID: CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alena!

On Tue, Apr 1, 2025 at 2:11 AM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
wrote:

> 4.1) explain analyze SELECT ten
>
> FROM onek t WHERE unique1 IN ( VALUES (0), ((2 IN ( SELECT unique2 FROM
> onek c WHERE c.unique2 in ((values(0),(2))))::integer)) );
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
> Seq Scan on onek t (cost=180.11..410.25 rows=2 width=6) (actual
> time=5.014..13.256 rows=3.00 loops=1) Filter: (unique1 = ANY (ARRAY[0,
> ((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Rows Removed by Filter:
> 10005 *Buffers: shared hit=110* SubPlan 1 -> Seq Scan on onek c
> (cost=0.00..180.10 rows=3 width=4) (actual time=0.022..4.951 rows=2.00
> loops=1) Filter: (unique2 = ANY ('{0,2}'::integer[])) Rows Removed by
> Filter: 10006 *Buffers: shared hit=55* Planning: Buffers: shared hit=6
> dirtied=1 Planning Time: 0.502 ms Execution Time: 13.348 ms (13 rows)
>
> The query plan without our patch:
>
> --------------------------------------------------------------------------------------------------------------------------------------------
> Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual time=5.072..9.076
> rows=3.00 loops=1) Hash Cond: (t.unique1 = "*VALUES*".column1) *Buffers:
> shared hit=55 read=55* -> Seq Scan on onek t (cost=0.00..155.08
> rows=10008 width=10) (actual time=0.145..1.802 rows=10008.00 loops=1) *
> Buffers: shared hit=52 read=3* -> Hash (cost=0.03..0.03 rows=2 width=4)
> (actual time=4.908..4.912 rows=2.00 loops=1) Buckets: 1024 Batches: 1
> Memory Usage: 9kB *Buffers: shared hit=3 read=52* -> Values Scan on
> "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual time=0.003..4.901
> rows=2.00 loops=1) *Buffers: shared hit=3 read=52* SubPlan 1 -> Hash Semi
> Join (cost=0.05..181.42 rows=2 width=4) (actual time=0.036..4.861 rows=2.00
> loops=1) Hash Cond: (c.unique2 = "*VALUES*_1".column1) *Buffers: shared
> hit=3 read=52* -> Seq Scan on onek c (cost=0.00..155.08 rows=10008
> width=4) (actual time=0.009..2.120 rows=10008.00 loops=1) *Buffers:
> shared hit=3 read=52* -> Hash (cost=0.03..0.03 rows=2 width=4) (actual
> time=0.006..0.008 rows=2.00 loops=1) Buckets: 1024 Batches: 1 Memory Usage:
> 9kB -> Values Scan on "*VALUES*_1" (cost=0.00..0.03 rows=2 width=4) (actual
> time=0.001..0.002 rows=2.00 loops=1) Planning: Buffers: shared hit=102
> read=22 Planning Time: 1.853 ms Execution Time: 9.281 ms (23 rows)
>

I think I managed to understand what is going on.

When we run a query with SOAP over a constant array
then convert_saop_to_hashed_saop_walker() provides acceleration with
hashing.

# explain analyze select * from test where val IN (5000, 4000, 9000, 2000,
1000, 140050);
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..21925.00 rows=6 width=4) (actual
time=2.015..223.984 rows=6.00 loops=1)
Filter: (val = ANY ('{5000,4000,9000,2000,1000,140050}'::integer[]))
Rows Removed by Filter: 999994
Buffers: shared hit=2228 read=2197
Planning Time: 0.246 ms
Execution Time: 224.036 ms
(6 rows)

But when there is expression or subselect, then hashing doesn't work and
query becomes slower.

# explain analyze select * from test where val IN (5000, 4000, 9000, 2000,
1000, (select 140050));
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.01..21925.01 rows=6 width=4) (actual
time=0.904..396.495 rows=6.00 loops=1)
Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, (InitPlan
1).col1]))
Rows Removed by Filter: 999994
Buffers: shared hit=2292 read=2133
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1.00 loops=1)
Planning Time: 0.160 ms
Execution Time: 396.538 ms
(8 rows)

In contrast, hashing is always available with VALUES.

# explain analyze select * from test where val in (VALUES (5000), (4000),
(9000), (2000), (1000), ((select 140050)));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.16..17050.23 rows=6 width=4) (actual
time=1.589..225.061 rows=6.00 loops=1)
Hash Cond: (test.val = "*VALUES*".column1)
Buffers: shared hit=2356 read=2069
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.003..0.003
rows=1.00 loops=1)
-> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=4) (actual
time=0.460..91.912 rows=1000000.00 loops=1)
Buffers: shared hit=2356 read=2069
-> Hash (cost=0.08..0.08 rows=6 width=4) (actual time=0.049..0.050
rows=6.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=4)
(actual time=0.009..0.032 rows=6.00 loops=1)
Planning Time: 0.627 ms
Execution Time: 225.155 ms
(12 rows)

I think we should allow our transformation only when the array is constant
(attached patchset). In future we may implement dynamic SAOP hashing, and
then allow our transformation in more cases.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v8-0001-Extract-make_SAOP_expr-function-from-match_orclau.patch application/x-patch 7.3 KB
v8-0002-Convert-x-IN-VALUES-.-to-x-ANY-.-then-appropriate.patch application/x-patch 32.0 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-01 14:23:02
Message-ID: 11eac8f5-18e1-42ba-a989-23e8e2346860@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alexander!

On 01.04.2025 15:07, Alexander Korotkov wrote:
> Hi, Alena!
>
> On Tue, Apr 1, 2025 at 2:11 AM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> 4.1) explain analyze SELECT ten
>
> FROM onek t WHERE unique1 IN ( VALUES (0), ((2 IN ( SELECT unique2
> FROM onek c WHERE c.unique2 in ((values(0),(2))))::integer)) );
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
> Seq Scan on onek t (cost=180.11..410.25 rows=2 width=6) (actual
> time=5.014..13.256 rows=3.00 loops=1) Filter: (unique1 = ANY
> (ARRAY[0, ((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Rows
> Removed by Filter: 10005 *Buffers: shared hit=110* SubPlan 1 ->
> Seq Scan on onek c (cost=0.00..180.10 rows=3 width=4) (actual
> time=0.022..4.951 rows=2.00 loops=1) Filter: (unique2 = ANY
> ('{0,2}'::integer[])) Rows Removed by Filter: 10006 *Buffers:
> shared hit=55* Planning: Buffers: shared hit=6 dirtied=1 Planning
> Time: 0.502 ms Execution Time: 13.348 ms (13 rows)
>
> The query plan without our patch:
>
> --------------------------------------------------------------------------------------------------------------------------------------------
> Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual
> time=5.072..9.076 rows=3.00 loops=1) Hash Cond: (t.unique1 =
> "*VALUES*".column1) *Buffers: shared hit=55 read=55* -> Seq Scan
> on onek t (cost=0.00..155.08 rows=10008 width=10) (actual
> time=0.145..1.802 rows=10008.00 loops=1) *Buffers: shared hit=52
> read=3* -> Hash (cost=0.03..0.03 rows=2 width=4) (actual
> time=4.908..4.912 rows=2.00 loops=1) Buckets: 1024 Batches: 1
> Memory Usage: 9kB *Buffers: shared hit=3 read=52* -> Values Scan
> on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual
> time=0.003..4.901 rows=2.00 loops=1) *Buffers: shared hit=3
> read=52* SubPlan 1 -> Hash Semi Join (cost=0.05..181.42 rows=2
> width=4) (actual time=0.036..4.861 rows=2.00 loops=1) Hash Cond:
> (c.unique2 = "*VALUES*_1".column1) *Buffers: shared hit=3 read=52*
> -> Seq Scan on onek c (cost=0.00..155.08 rows=10008 width=4)
> (actual time=0.009..2.120 rows=10008.00 loops=1) *Buffers: shared
> hit=3 read=52* -> Hash (cost=0.03..0.03 rows=2 width=4) (actual
> time=0.006..0.008 rows=2.00 loops=1) Buckets: 1024 Batches: 1
> Memory Usage: 9kB -> Values Scan on "*VALUES*_1" (cost=0.00..0.03
> rows=2 width=4) (actual time=0.001..0.002 rows=2.00 loops=1)
> Planning: Buffers: shared hit=102 read=22 Planning Time: 1.853 ms
> Execution Time: 9.281 ms (23 rows)
>
>
> I think I managed to understand what is going on.
>
> When we run a query with SOAP over a constant array
> then convert_saop_to_hashed_saop_walker() provides acceleration with
> hashing.
>
> # explain analyze select * from test where val IN (5000, 4000, 9000,
> 2000, 1000, 140050);
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..21925.00 rows=6 width=4) (actual
> time=2.015..223.984 rows=6.00 loops=1)
>    Filter: (val = ANY ('{5000,4000,9000,2000,1000,140050}'::integer[]))
>    Rows Removed by Filter: 999994
>    Buffers: shared hit=2228 read=2197
>  Planning Time: 0.246 ms
>  Execution Time: 224.036 ms
> (6 rows)
>
> But when there is expression or subselect, then hashing doesn't work
> and query becomes slower.
>
> # explain analyze select * from test where val IN (5000, 4000, 9000,
> 2000, 1000, (select 140050));
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.01..21925.01 rows=6 width=4) (actual
> time=0.904..396.495 rows=6.00 loops=1)
>    Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, (InitPlan
> 1).col1]))
>    Rows Removed by Filter: 999994
>    Buffers: shared hit=2292 read=2133
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
>  Planning Time: 0.160 ms
>  Execution Time: 396.538 ms
> (8 rows)
>
> In contrast, hashing is always available with VALUES.
>
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), ((select 140050)));
>  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
>  Hash Semi Join  (cost=0.16..17050.23 rows=6 width=4) (actual
> time=1.589..225.061 rows=6.00 loops=1)
>    Hash Cond: (test.val = "*VALUES*".column1)
>    Buffers: shared hit=2356 read=2069
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.003..0.003 rows=1.00 loops=1)
>    ->  Seq Scan on test  (cost=0.00..14425.00 rows=1000000 width=4)
> (actual time=0.460..91.912 rows=1000000.00 loops=1)
>          Buffers: shared hit=2356 read=2069
>    ->  Hash  (cost=0.08..0.08 rows=6 width=4) (actual
> time=0.049..0.050 rows=6.00 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6
> width=4) (actual time=0.009..0.032 rows=6.00 loops=1)
>  Planning Time: 0.627 ms
>  Execution Time: 225.155 ms
> (12 rows)
>
> I think we should allow our transformation only when the array is
> constant (attached patchset).

Yes, I agree with your conclusions; however, I noticed that you didn’t
take Param-type variables into account.
These still get executed during the VALUES -> ANY transformation (see
regression tests).

+PREPARE test2 (int,numeric, text) AS
+  SELECT ten FROM onek
+  WHERE sin(two)*four/($3::real) IN (VALUES (2), ($2), ($2), ($1));
+-- VTA forbidden because of unresolved casting of numeric parameter to
common type
+EXPLAIN (COSTS OFF) EXECUTE test2(2, 2, '2');
+                                                         QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on onek
+   Filter: (((sin((two)::double precision) * (four)::double precision)
/ '2'::real) = ANY ('{2,2,2,2}'::double precision[]))
+(2 rows)
+
+PREPARE test3 (int,int, text) AS
+  SELECT ten FROM onek
+  WHERE sin(two)*four/($3::real) IN (VALUES (2), ($2), ($2), ($1));
+EXPLAIN (COSTS OFF) EXECUTE test3(2, 2, '2');
+                                                         QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on onek
+   Filter: (((sin((two)::double precision) * (four)::double precision)
/ '2'::real) = ANY ('{2,2,2,2}'::double precision[]))
+(2 rows)

In my opinion, we can apply the VALUES ->ANY transformation to them as
well. What do you think? I ran some queries and didn’t notice any
significant performance degradation.

create table test (x int);
insert into test select id from generate_series(1,1000) id;
PREPARE test4 (int,int, int) AS select * from test where x IN ($1, $2, $3);
PREPARE test3 (int,int, int) AS select * from test where x IN ($1, $2,
 (select $3));
EXPLAIN ANALYZE EXECUTE test4(2, 2, 2);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..18.75 rows=3 width=4) (actual
time=0.016..0.353 rows=1.00 loops=1)
   Filter: (x = ANY (ARRAY[$1, $2, $3]))
   Rows Removed by Filter: 999
   Buffers: shared hit=5
 Planning:
   Buffers: shared hit=20
 Planning Time: 0.266 ms
 Execution Time: 0.367 ms
(8 rows)

alena(at)postgres=# EXPLAIN ANALYZE EXECUTE test3(2, 2, 2);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.01..18.76 rows=3 width=4) (actual
time=0.072..1.379 rows=1.00 loops=1)
   Filter: (x = ANY (ARRAY[2, 2, (InitPlan 1).col1]))
   Rows Removed by Filter: 999
   Buffers: shared hit=5
   InitPlan 1
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.003..0.003 rows=1.00 loops=1)
 Planning Time: 0.350 ms
 Execution Time: 1.431 ms
(8 rows)

alena(at)postgres=# PREPARE test6 (int,int, int) AS select * from test
where x IN (values($1), ($2), ($3));
PREPARE
alena(at)postgres=# EXPLAIN ANALYZE EXECUTE test6(2, 2, 2);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..18.75 rows=3 width=4) (actual
time=0.055..0.683 rows=1.00 loops=1)
   Filter: (x = ANY ('{2,2,2}'::integer[]))
   Rows Removed by Filter: 999
   Buffers: shared hit=5
 Planning Time: 0.230 ms
 Execution Time: 0.724 ms
(6 rows)

We can’t use hashing for them, but without this transformation, we still
have to perform a join.

----------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=0.08..17.73 rows=3 width=4) (actual
time=0.124..0.943 rows=1.00 loops=1)
   Hash Cond: (test.x = "*VALUES*".column1)
   Buffers: shared hit=5
   ->  Seq Scan on test  (cost=0.00..15.00 rows=1000 width=4) (actual
time=0.051..0.389 rows=1000.00 loops=1)
         Buffers: shared hit=5
   ->  Hash  (cost=0.04..0.04 rows=3 width=4) (actual time=0.028..0.030
rows=3.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3
width=4) (actual time=0.004..0.010 rows=3.00 loops=1)
 Planning:
   Buffers: shared hit=105 read=1
 Planning Time: 2.176 ms
 Execution Time: 1.077 ms
(12 rows)

So, I think we can bring it back and construct the Array node based on
the have_param flag.

foreach (lc, rte->values_lists)
+    {
+        List *elem = lfirst(lc);
+        Node *value = linitial(elem);
+
+        value = eval_const_expressions(NULL, value);
+
+        if (!IsA(value, Const))
+            have_param = true;
+
+        consts = lappend(consts, value);
+
+    }

Regarding the check for the presence of Var elements before the
transformation, I think we should, for now, restore the walker function
(values_simplicity_check_walker) that
traverses the query to identify Var nodes. This function was included in
the initial version of the patch:

+/*
+ * The function traverses the tree looking for elements of type var.
+ * If it finds it, it returns true.
+ */
+static bool
+values_simplicity_check_walker(Node *node, void *ctx)
+{
+    if (node == NULL)
+    {
+        return false;
+    }
+    else if(IsA(node, Var))
+        return true;
+    else if(IsA(node, Query))
+        return query_tree_walker((Query *) node,
+                                 values_simplicity_check_walker,
+                                 (void*) ctx,
+                                 QTW_EXAMINE_RTES_BEFORE);
+
+    return expression_tree_walker(node, values_simplicity_check_walker,
+                                  (void *) ctx);
+}

> In future we may implement dynamic SAOP hashing, and then allow our
> transformation in more cases.
I agree with your suggestion) Thank you for your interest to this
subject and contribution!

--
Regards,
Alena Rybakina
Postgres Professional


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-02 14:33:43
Message-ID: d621919d-904d-40eb-ab72-3f118d6fde52@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.04.2025 17:23, Alena Rybakina wrote:
>
> Hi, Alexander!
>
> On 01.04.2025 15:07, Alexander Korotkov wrote:
>> Hi, Alena!
>>
>> On Tue, Apr 1, 2025 at 2:11 AM Alena Rybakina
>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>
>> 4.1) explain analyze SELECT ten
>>
>> FROM onek t WHERE unique1 IN ( VALUES (0), ((2 IN ( SELECT
>> unique2 FROM onek c WHERE c.unique2 in
>> ((values(0),(2))))::integer)) );
>>
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------
>> Seq Scan on onek t (cost=180.11..410.25 rows=2 width=6) (actual
>> time=5.014..13.256 rows=3.00 loops=1) Filter: (unique1 = ANY
>> (ARRAY[0, ((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Rows
>> Removed by Filter: 10005 *Buffers: shared hit=110* SubPlan 1 ->
>> Seq Scan on onek c (cost=0.00..180.10 rows=3 width=4) (actual
>> time=0.022..4.951 rows=2.00 loops=1) Filter: (unique2 = ANY
>> ('{0,2}'::integer[])) Rows Removed by Filter: 10006 *Buffers:
>> shared hit=55* Planning: Buffers: shared hit=6 dirtied=1 Planning
>> Time: 0.502 ms Execution Time: 13.348 ms (13 rows)
>>
>> The query plan without our patch:
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------
>> Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual
>> time=5.072..9.076 rows=3.00 loops=1) Hash Cond: (t.unique1 =
>> "*VALUES*".column1) *Buffers: shared hit=55 read=55* -> Seq Scan
>> on onek t (cost=0.00..155.08 rows=10008 width=10) (actual
>> time=0.145..1.802 rows=10008.00 loops=1) *Buffers: shared hit=52
>> read=3* -> Hash (cost=0.03..0.03 rows=2 width=4) (actual
>> time=4.908..4.912 rows=2.00 loops=1) Buckets: 1024 Batches: 1
>> Memory Usage: 9kB *Buffers: shared hit=3 read=52* -> Values Scan
>> on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual
>> time=0.003..4.901 rows=2.00 loops=1) *Buffers: shared hit=3
>> read=52* SubPlan 1 -> Hash Semi Join (cost=0.05..181.42 rows=2
>> width=4) (actual time=0.036..4.861 rows=2.00 loops=1) Hash Cond:
>> (c.unique2 = "*VALUES*_1".column1) *Buffers: shared hit=3
>> read=52* -> Seq Scan on onek c (cost=0.00..155.08 rows=10008
>> width=4) (actual time=0.009..2.120 rows=10008.00 loops=1)
>> *Buffers: shared hit=3 read=52* -> Hash (cost=0.03..0.03 rows=2
>> width=4) (actual time=0.006..0.008 rows=2.00 loops=1) Buckets:
>> 1024 Batches: 1 Memory Usage: 9kB -> Values Scan on "*VALUES*_1"
>> (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002
>> rows=2.00 loops=1) Planning: Buffers: shared hit=102 read=22
>> Planning Time: 1.853 ms Execution Time: 9.281 ms (23 rows)
>>
>>
>> I think I managed to understand what is going on.
>>
>> When we run a query with SOAP over a constant array
>> then convert_saop_to_hashed_saop_walker() provides acceleration with
>> hashing.
>>
>> # explain analyze select * from test where val IN (5000, 4000, 9000,
>> 2000, 1000, 140050);
>>                                               QUERY PLAN
>> -------------------------------------------------------------------------------------------------------
>>  Seq Scan on test  (cost=0.00..21925.00 rows=6 width=4) (actual
>> time=2.015..223.984 rows=6.00 loops=1)
>>    Filter: (val = ANY ('{5000,4000,9000,2000,1000,140050}'::integer[]))
>>    Rows Removed by Filter: 999994
>>    Buffers: shared hit=2228 read=2197
>>  Planning Time: 0.246 ms
>>  Execution Time: 224.036 ms
>> (6 rows)
>>
>> But when there is expression or subselect, then hashing doesn't work
>> and query becomes slower.
>>
>> # explain analyze select * from test where val IN (5000, 4000, 9000,
>> 2000, 1000, (select 140050));
>>                                               QUERY PLAN
>> -------------------------------------------------------------------------------------------------------
>>  Seq Scan on test  (cost=0.01..21925.01 rows=6 width=4) (actual
>> time=0.904..396.495 rows=6.00 loops=1)
>>    Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, (InitPlan
>> 1).col1]))
>>    Rows Removed by Filter: 999994
>>    Buffers: shared hit=2292 read=2133
>>    InitPlan 1
>>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
>> time=0.002..0.002 rows=1.00 loops=1)
>>  Planning Time: 0.160 ms
>>  Execution Time: 396.538 ms
>> (8 rows)
>>
>> In contrast, hashing is always available with VALUES.
>>
>> # explain analyze select * from test where val in (VALUES (5000),
>> (4000), (9000), (2000), (1000), ((select 140050)));
>>  QUERY PLAN
>> ------------------------------------------------------------------------------------------------------------------------
>>  Hash Semi Join  (cost=0.16..17050.23 rows=6 width=4) (actual
>> time=1.589..225.061 rows=6.00 loops=1)
>>    Hash Cond: (test.val = "*VALUES*".column1)
>>    Buffers: shared hit=2356 read=2069
>>    InitPlan 1
>>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
>> time=0.003..0.003 rows=1.00 loops=1)
>>    ->  Seq Scan on test  (cost=0.00..14425.00 rows=1000000 width=4)
>> (actual time=0.460..91.912 rows=1000000.00 loops=1)
>>          Buffers: shared hit=2356 read=2069
>>    ->  Hash  (cost=0.08..0.08 rows=6 width=4) (actual
>> time=0.049..0.050 rows=6.00 loops=1)
>>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>>          ->  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6
>> width=4) (actual time=0.009..0.032 rows=6.00 loops=1)
>>  Planning Time: 0.627 ms
>>  Execution Time: 225.155 ms
>> (12 rows)
>>
>> I think we should allow our transformation only when the array is
>> constant (attached patchset).
>
> Yes, I agree with your conclusions; however, I noticed that you didn’t
> take Param-type variables into account.
> These still get executed during the VALUES -> ANY transformation (see
> regression tests).
>
> +PREPARE test2 (int,numeric, text) AS
> +  SELECT ten FROM onek
> +  WHERE sin(two)*four/($3::real) IN (VALUES (2), ($2), ($2), ($1));
> +-- VTA forbidden because of unresolved casting of numeric parameter
> to common type
> +EXPLAIN (COSTS OFF) EXECUTE test2(2, 2, '2');
> +                                                         QUERY PLAN
> +-----------------------------------------------------------------------------------------------------------------------------
> + Seq Scan on onek
> +   Filter: (((sin((two)::double precision) * (four)::double
> precision) / '2'::real) = ANY ('{2,2,2,2}'::double precision[]))
> +(2 rows)
> +
> +PREPARE test3 (int,int, text) AS
> +  SELECT ten FROM onek
> +  WHERE sin(two)*four/($3::real) IN (VALUES (2), ($2), ($2), ($1));
> +EXPLAIN (COSTS OFF) EXECUTE test3(2, 2, '2');
> +                                                         QUERY PLAN
> +-----------------------------------------------------------------------------------------------------------------------------
> + Seq Scan on onek
> +   Filter: (((sin((two)::double precision) * (four)::double
> precision) / '2'::real) = ANY ('{2,2,2,2}'::double precision[]))
> +(2 rows)
>
> In my opinion, we can apply the VALUES ->ANY transformation to them as
> well. What do you think? I ran some queries and didn’t notice any
> significant performance degradation.
>
> create table test (x int);
> insert into test select id from generate_series(1,1000) id;
> PREPARE test4 (int,int, int) AS select * from test where x IN ($1, $2,
> $3);
> PREPARE test3 (int,int, int) AS select * from test where x IN ($1, $2,
>  (select $3));
> EXPLAIN ANALYZE EXECUTE test4(2, 2, 2);
>                                             QUERY PLAN
> --------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..18.75 rows=3 width=4) (actual
> time=0.016..0.353 rows=1.00 loops=1)
>    Filter: (x = ANY (ARRAY[$1, $2, $3]))
>    Rows Removed by Filter: 999
>    Buffers: shared hit=5
>  Planning:
>    Buffers: shared hit=20
>  Planning Time: 0.266 ms
>  Execution Time: 0.367 ms
> (8 rows)
>
> alena(at)postgres=# EXPLAIN ANALYZE EXECUTE test3(2, 2, 2);
>                                             QUERY PLAN
> --------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.01..18.76 rows=3 width=4) (actual
> time=0.072..1.379 rows=1.00 loops=1)
>    Filter: (x = ANY (ARRAY[2, 2, (InitPlan 1).col1]))
>    Rows Removed by Filter: 999
>    Buffers: shared hit=5
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.003..0.003 rows=1.00 loops=1)
>  Planning Time: 0.350 ms
>  Execution Time: 1.431 ms
> (8 rows)
>
>
> alena(at)postgres=# PREPARE test6 (int,int, int) AS select * from test
> where x IN (values($1), ($2), ($3));
> PREPARE
> alena(at)postgres=# EXPLAIN ANALYZE EXECUTE test6(2, 2, 2);
>                                             QUERY PLAN
> --------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..18.75 rows=3 width=4) (actual
> time=0.055..0.683 rows=1.00 loops=1)
>    Filter: (x = ANY ('{2,2,2}'::integer[]))
>    Rows Removed by Filter: 999
>    Buffers: shared hit=5
>  Planning Time: 0.230 ms
>  Execution Time: 0.724 ms
> (6 rows)
>
> We can’t use hashing for them, but without this transformation, we
> still have to perform a join.
>
> ----------------------------------------------------------------------------------------------------------------------
>  Hash Semi Join  (cost=0.08..17.73 rows=3 width=4) (actual
> time=0.124..0.943 rows=1.00 loops=1)
>    Hash Cond: (test.x = "*VALUES*".column1)
>    Buffers: shared hit=5
>    ->  Seq Scan on test  (cost=0.00..15.00 rows=1000 width=4) (actual
> time=0.051..0.389 rows=1000.00 loops=1)
>          Buffers: shared hit=5
>    ->  Hash  (cost=0.04..0.04 rows=3 width=4) (actual
> time=0.028..0.030 rows=3.00 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3
> width=4) (actual time=0.004..0.010 rows=3.00 loops=1)
>  Planning:
>    Buffers: shared hit=105 read=1
>  Planning Time: 2.176 ms
>  Execution Time: 1.077 ms
> (12 rows)
>
> So, I think we can bring it back and construct the Array node based on
> the have_param flag.
>
> foreach (lc, rte->values_lists)
> +    {
> +        List *elem = lfirst(lc);
> +        Node *value = linitial(elem);
> +
> +        value = eval_const_expressions(NULL, value);
> +
> +        if (!IsA(value, Const))
> +            have_param = true;
> +
> +        consts = lappend(consts, value);
> +
> +    }
>
> Regarding the check for the presence of Var elements before the
> transformation, I think we should, for now, restore the walker
> function (values_simplicity_check_walker) that
> traverses the query to identify Var nodes. This function was included
> in the initial version of the patch:
>
> +/*
> + * The function traverses the tree looking for elements of type var.
> + * If it finds it, it returns true.
> + */
> +static bool
> +values_simplicity_check_walker(Node *node, void *ctx)
> +{
> +    if (node == NULL)
> +    {
> +        return false;
> +    }
> +    else if(IsA(node, Var))
> +        return true;
> +    else if(IsA(node, Query))
> +        return query_tree_walker((Query *) node,
> +  values_simplicity_check_walker,
> +                                 (void*) ctx,
> +                                 QTW_EXAMINE_RTES_BEFORE);
> +
> +    return expression_tree_walker(node, values_simplicity_check_walker,
> +                                  (void *) ctx);
> +}
>
>> In future we may implement dynamic SAOP hashing, and then allow our
>> transformation in more cases.
> I agree with your suggestion) Thank you for your interest to this
> subject and contribution!

I prepared a patch according to my suggestions, it just checks that the
transformation is not carried out if there is a var element, there are
changes only in one test, but I think it is correct.

diff -U3
/home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
/home/alena/postgrespro_or3/src/test/regress/results/subselect.out
--- /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
2025-04-02 02:50:07.018329864 +0300
+++ /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
2025-04-02 17:27:09.845104001 +0300
@@ -3027,18 +3027,15 @@
 SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
   (SELECT (3)))::integer)
 );
-                     QUERY PLAN
-----------------------------------------------------
- Nested Loop
-   ->  Unique
-         ->  Sort
-               Sort Key: "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     SubPlan 1
-                       ->  Result
-   ->  Index Scan using onek_unique1 on onek t
-         Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+                                           QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on onek t
+   Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan
1).col1)))::integer]))
+   ->  Bitmap Index Scan on onek_unique1
+         Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
SubPlan 1).col1)))::integer]))
+   SubPlan 1
+     ->  Result
+(6 rows)

 -- Alow to transformation and hold conversion between types of colemns and
 -- declared type of column pointed in RTE

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
values_to_any.diff.no-cfbot text/plain 3.1 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-02 19:00:49
Message-ID: CAPpHfdugRg1Nix_82AsUQOcz6Q-761_Y440t9zQLwdhy+YcC2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Alena!

On Wed, Apr 2, 2025 at 5:33 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
wrote:

> I prepared a patch according to my suggestions, it just checks that the
> transformation is not carried out if there is a var element, there are
> changes only in one test, but I think it is correct.
>
> diff -U3
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> --- /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> 2025-04-02 02:50:07.018329864 +0300
> +++ /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> 2025-04-02 17:27:09.845104001 +0300
> @@ -3027,18 +3027,15 @@
> SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
> (SELECT (3)))::integer)
> );
> - QUERY PLAN
> -----------------------------------------------------
> - Nested Loop
> - -> Unique
> - -> Sort
> - Sort Key: "*VALUES*".column1
> - -> Values Scan on "*VALUES*"
> - SubPlan 1
> - -> Result
> - -> Index Scan using onek_unique1 on onek t
> - Index Cond: (unique1 = "*VALUES*".column1)
> -(9 rows)
> + QUERY
> PLAN
>
> +------------------------------------------------------------------------------------------------
> + Bitmap Heap Scan on onek t
> + Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan
> 1).col1)))::integer]))
> + -> Bitmap Index Scan on onek_unique1
> + Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan
> 1).col1)))::integer]))
> + SubPlan 1
> + -> Result
> +(6 rows)
>
> -- Alow to transformation and hold conversion between types of colemns and
> -- declared type of column pointed in RTE
>
Your patch turns back performance degradation issue that I mention in [1].

With your patch.
# explain analyze select * from test where val in (VALUES (5000), (4000),
(9000), (2000), (1000), (140050), ((select 1)));
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.01..23175.01 rows=7 width=4) (actual
time=0.455..417.929 rows=7.00 loops=1)
Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, 140050,
(InitPlan 1).col1]))
Rows Removed by Filter: 999993
Buffers: shared hit=2362 read=2063
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1.00 loops=1)
Planning Time: 0.211 ms
Execution Time: 417.984 ms
(8 rows)

Without your patch.
# explain analyze select * from test where val in (VALUES (5000), (4000),
(9000), (2000), (1000), (140050), ((select 1)));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.19..17050.26 rows=7 width=4) (actual
time=0.289..210.335 rows=7.00 loops=1)
Hash Cond: (test.val = "*VALUES*".column1)
Buffers: shared hit=2211 read=2214
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1.00 loops=1)
-> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=4) (actual
time=0.203..85.925 rows=1000000.00 loops=1)
Buffers: shared hit=2211 read=2214
-> Hash (cost=0.09..0.09 rows=7 width=4) (actual time=0.067..0.068
rows=7.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.09 rows=7 width=4)
(actual time=0.004..0.023 rows=7.00 loops=1)
Planning Time: 0.321 ms
Execution Time: 210.409 ms
(12 rows)

Only when we have all consts in the array, we can have SAOP hashing which
is competitive with hashing of VALUES node. So, I'm continuing with my
version of patch.

Links.
1.
https://www.postgresql.org/message-id/CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-03 14:18:46
Message-ID: 6865d6c1-8bb0-4528-af96-f62220b67f27@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.04.2025 22:00, Alexander Korotkov wrote:
> Hi, Alena!
>
> On Wed, Apr 2, 2025 at 5:33 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> I prepared a patch according to my suggestions, it just checks
> that the transformation is not carried out if there is a var
> element, there are changes only in one test, but I think it is
> correct.
>
> diff -U3
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> ---
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> 2025-04-02 02:50:07.018329864 +0300
> +++
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> 2025-04-02 17:27:09.845104001 +0300
> @@ -3027,18 +3027,15 @@
>  SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
>    (SELECT (3)))::integer)
>  );
> -                     QUERY PLAN
> -----------------------------------------------------
> - Nested Loop
> -   ->  Unique
> -         ->  Sort
> -               Sort Key: "*VALUES*".column1
> -               ->  Values Scan on "*VALUES*"
> -                     SubPlan 1
> -                       ->  Result
> -   ->  Index Scan using onek_unique1 on onek t
> -         Index Cond: (unique1 = "*VALUES*".column1)
> -(9 rows)
> +                                           QUERY PLAN
> +------------------------------------------------------------------------------------------------
> + Bitmap Heap Scan on onek t
> +   Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
> SubPlan 1).col1)))::integer]))
> +   ->  Bitmap Index Scan on onek_unique1
> +         Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
> SubPlan 1).col1)))::integer]))
> +   SubPlan 1
> +     ->  Result
> +(6 rows)
>
>  -- Alow to transformation and hold conversion between types of
> colemns and
>  -- declared type of column pointed in RTE
>
> Your patch turns back performance degradation issue that I mention in [1].
>
> With your patch.
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), (140050), ((select 1)));
>             QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.01..23175.01 rows=7 width=4) (actual
> time=0.455..417.929 rows=7.00 loops=1)
>    Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, 140050,
> (InitPlan 1).col1]))
>    Rows Removed by Filter: 999993
>    Buffers: shared hit=2362 read=2063
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
>  Planning Time: 0.211 ms
>  Execution Time: 417.984 ms
> (8 rows)
>
> Without your patch.
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), (140050), ((select 1)));
>  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
>  Hash Semi Join  (cost=0.19..17050.26 rows=7 width=4) (actual
> time=0.289..210.335 rows=7.00 loops=1)
>    Hash Cond: (test.val = "*VALUES*".column1)
>    Buffers: shared hit=2211 read=2214
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
>    ->  Seq Scan on test  (cost=0.00..14425.00 rows=1000000 width=4)
> (actual time=0.203..85.925 rows=1000000.00 loops=1)
>          Buffers: shared hit=2211 read=2214
>    ->  Hash  (cost=0.09..0.09 rows=7 width=4) (actual
> time=0.067..0.068 rows=7.00 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  Values Scan on "*VALUES*"  (cost=0.00..0.09 rows=7
> width=4) (actual time=0.004..0.023 rows=7.00 loops=1)
>  Planning Time: 0.321 ms
>  Execution Time: 210.409 ms
> (12 rows)
>
> Only when we have all consts in the array, we can have SAOP hashing
> which is competitive with hashing of VALUES node.  So, I'm continuing
> with my version of patch.
>
> Links.
> 1.
> https://www.postgresql.org/message-id/CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw%40mail.gmail.com
>
>
Okay, I agree with you.

--
Regards,
Alena Rybakina
Postgres Professional


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-03 15:26:48
Message-ID: CAPpHfduiD-0PnUDRa3woSm47+LNY48J4tcPspft2Rir2Ag+w2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 3, 2025 at 5:18 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> Okay, I agree with you.

Good. I've reflected this limitation in comments and the commit
message. Also, I've adjust regression tests by removing excessive
ones and adding more important cases. I'm going to push this if no
objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v9-0001-Extract-make_SAOP_expr-function-from-match_orclau.patch application/octet-stream 7.3 KB
v9-0002-Convert-x-IN-VALUES-.-to-x-ANY-.-then-appropriate.patch application/octet-stream 24.1 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-03 15:51:10
Message-ID: a7fdc1f9-84ac-4c2c-ad26-2274474fcc86@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.04.2025 18:26, Alexander Korotkov wrote:
> On Thu, Apr 3, 2025 at 5:18 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> Okay, I agree with you.
> Good. I've reflected this limitation in comments and the commit
> message.
Thank you, it looks fine)
> Also, I've adjust regression tests by removing excessive
> ones and adding more important cases. I'm going to push this if no
> objections.
I agree with your changes.

--
Regards,
Alena Rybakina
Postgres Professional


From: Andres Freund <andres(at)anarazel(dot)de>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 14:39:53
Message-ID: srnuqlttuimzmvoulhsrbgvj4vnul6b65osswvua7sfkqsvmuy@yg7apybpxp34
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2025-04-03 18:26:48 +0300, Alexander Korotkov wrote:
> On Thu, Apr 3, 2025 at 5:18 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> > Okay, I agree with you.
>
> Good. I've reflected this limitation in comments and the commit
> message. Also, I've adjust regression tests by removing excessive
> ones and adding more important cases. I'm going to push this if no
> objections.

This just failed on both buildfarm and CI:

BF:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2025-04-04%2013%3A52%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=iguana&dt=2025-04-04%2014%3A08%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2025-04-04%2013%3A52%3A13
...

CI:

https://cirrus-ci.com/task/5466958990147584
https://api.cirrus-ci.com/v1/artifact/task/5466958990147584/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade

--- C:/cirrus/src/test/regress/expected/subselect.out 2025-04-04 13:20:49.429815400 +0000
+++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out 2025-04-04 13:23:33.602847700 +0000
@@ -2769,15 +2769,16 @@
EXPLAIN (COSTS OFF)
SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
WHERE c.oid IN (VALUES (1), (2));
- QUERY PLAN
----------------------------------------------------------------
- Hash Join
- Hash Cond: (a.oid = c.oid)
- -> Seq Scan on pg_am a
- -> Hash
- -> Index Scan using pg_class_oid_index on pg_class c
- Index Cond: (oid = ANY ('{1,2}'::oid[]))
-(6 rows)
+ QUERY PLAN
+---------------------------------------------------------
+ Merge Join
+ Merge Cond: (c.oid = a.oid)
+ -> Index Scan using pg_class_oid_index on pg_class c
+ Index Cond: (oid = ANY ('{1,2}'::oid[]))
+ -> Sort
+ Sort Key: a.oid
+ -> Seq Scan on pg_am a
+(7 rows)

-- Constant expressions are simplified
EXPLAIN (COSTS OFF)

This failure was entirely predictable, as this patch has been failing on cfbot
for *months*. Seriously. The tests passed exactly once since 2024-11-26, see:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5283

Come-on.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 15:17:53
Message-ID: 1856886.1743779873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> This just failed on both buildfarm and CI:

Yeah, the plan for that join seems fairly unstable. Is there
actually any need for the join to pg_am?

regards, tom lane


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 15:19:37
Message-ID: eeac3615-12b9-40e5-8465-b8c26b049881@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.04.2025 17:39, Andres Freund wrote:
> On 2025-04-03 18:26:48 +0300, Alexander Korotkov wrote:
>> On Thu, Apr 3, 2025 at 5:18 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>> Okay, I agree with you.
>> Good. I've reflected this limitation in comments and the commit
>> message. Also, I've adjust regression tests by removing excessive
>> ones and adding more important cases. I'm going to push this if no
>> objections.
> This just failed on both buildfarm and CI:
>
> BF:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2025-04-04%2013%3A52%3A13
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=iguana&dt=2025-04-04%2014%3A08%3A50
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2025-04-04%2013%3A52%3A13
> ...
>
>
> CI:
>
> https://cirrus-ci.com/task/5466958990147584
> https://api.cirrus-ci.com/v1/artifact/task/5466958990147584/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade
>
> --- C:/cirrus/src/test/regress/expected/subselect.out 2025-04-04 13:20:49.429815400 +0000
> +++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out 2025-04-04 13:23:33.602847700 +0000
> @@ -2769,15 +2769,16 @@
> EXPLAIN (COSTS OFF)
> SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
> WHERE c.oid IN (VALUES (1), (2));
> - QUERY PLAN
> ----------------------------------------------------------------
> - Hash Join
> - Hash Cond: (a.oid = c.oid)
> - -> Seq Scan on pg_am a
> - -> Hash
> - -> Index Scan using pg_class_oid_index on pg_class c
> - Index Cond: (oid = ANY ('{1,2}'::oid[]))
> -(6 rows)
> + QUERY PLAN
> +---------------------------------------------------------
> + Merge Join
> + Merge Cond: (c.oid = a.oid)
> + -> Index Scan using pg_class_oid_index on pg_class c
> + Index Cond: (oid = ANY ('{1,2}'::oid[]))
> + -> Sort
> + Sort Key: a.oid
> + -> Seq Scan on pg_am a
> +(7 rows)
>
> -- Constant expressions are simplified
> EXPLAIN (COSTS OFF)
>
>
> This failure was entirely predictable, as this patch has been failing on cfbot
> for *months*. Seriously. The tests passed exactly once since 2024-11-26, see:
>
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5283
>
> Come-on.
>
>
Sorry, I didn't understand at first how to find a specific diff file to
analyze what was wrong with my test, to be honest.
I only started to understand how to do this recently, then later I
thought everything was fine after the letter several days before.

I fixed it - changed the tables and didn't use system tables.

Sorry again, I'll be more careful with this.

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
values_test.diff text/x-patch 2.4 KB

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 15:33:56
Message-ID: CAH2-WzmzBAtm4ujfNABpHZ9iLiG8=0j-GBUgtt9tNFqyw_BMAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 4, 2025 at 11:19 AM Alena Rybakina
<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> I fixed it - changed the tables and didn't use system tables.

Is somebody going to commit this soon? Alexander?

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 16:00:39
Message-ID: 1910361.1743782439@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> Is somebody going to commit this soon? Alexander?

Done.

I thought the point of the test case was to check VALUES folding
on both sides of the join. I'm not sure why that's actually
a useful expenditure of test cycles, but Alena's patch undid that
without adjusting the comment claiming that's what it's doing.
So I kept the shape of the test the same while substituting
hopefully-stabler tables as she did.

If it's still not stable I think the next step is to nuke both
test queries, since I remain of the opinion that they're likely
a waste of cycles.

regards, tom lane


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 16:02:51
Message-ID: CAH2-Wzn7SoJnoKidMc9D+3dk5xWNGoP90Saqhb8LWW6dwKkrRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 4, 2025 at 12:00 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> > Is somebody going to commit this soon? Alexander?
>
> Done.

Thanks!

> If it's still not stable I think the next step is to nuke both
> test queries, since I remain of the opinion that they're likely
> a waste of cycles.

+1

--
Peter Geoghegan


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 16:04:36
Message-ID: CAPpHfdstLSmHE+JiuHVsY2LatzOaCwCvBcsGy7vSFawGy0yStA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 4, 2025 at 7:00 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> > Is somebody going to commit this soon? Alexander?
>
> Done.
>
> I thought the point of the test case was to check VALUES folding
> on both sides of the join. I'm not sure why that's actually
> a useful expenditure of test cycles, but Alena's patch undid that
> without adjusting the comment claiming that's what it's doing.
> So I kept the shape of the test the same while substituting
> hopefully-stabler tables as she did.
>
> If it's still not stable I think the next step is to nuke both
> test queries, since I remain of the opinion that they're likely
> a waste of cycles.

Thank you for pushing this! I just start looking at the problem and
have just seen your commit.

------
Regards,
Alexander Korotkov
Supabase


From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-04 16:30:47
Message-ID: b972df06-468b-4e18-9f58-f70abc75415f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.04.2025 19:00, Tom Lane wrote:
> Peter Geoghegan<pg(at)bowt(dot)ie> writes:
>> Is somebody going to commit this soon? Alexander?
> Done.
>
> I thought the point of the test case was to check VALUES folding
> on both sides of the join. I'm not sure why that's actually
> a useful expenditure of test cycles, but Alena's patch undid that
> without adjusting the comment claiming that's what it's doing.
> So I kept the shape of the test the same while substituting
> hopefully-stabler tables as she did.
Yes, that's right. Thank you.
> If it's still not stable I think the next step is to nuke both
> test queries, since I remain of the opinion that they're likely
> a waste of cycles.
>
>
Agree.

--
Regards,
Alena Rybakina
Postgres Professional