Re: [POC] Allow flattening of subquery with a link to upper query

Lists: pgsql-hackers
From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-08-31 06:35:09
Message-ID: f26a8b9e-5b41-4013-9e95-7a056456e30c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

One of the most annoying things in the planner for me is unnesting of
correlated queries [1]. A number papers on this subject were published
starting 1980s, but only trivial optimizations exists in the Core. It
means a lack of performance, especially when we use foreign tables in
subquery.
In the patch I'm trying to propose a sort of sketch of solution.

Before flattening procedure we just look through the quals of subquery,
pull to the upper level OpExpr's containing variables from the upper
relation and replace their positions in the quals with true expression.
Further, the flattening machinery works as usual.

This patch is dedicated to simplest variant of correlated queries -
without aggregate functions in the target list. It passes regression
tests and contains some additional tests to demonstrate achievements.

I'd like to get critics on the approach.

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans.
Database Syst. 7 (1982): 443-469.

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
0001-Transform-correlated-subquery-of-type-N-J-1-into-ord.patch text/x-patch 36.8 KB

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-09-01 12:24:32
Message-ID: CAMbWs4_3KyJDMgZLL1xkp0ev+FbuwjJLuVHfQD6LCYPb35rNXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> Before flattening procedure we just look through the quals of subquery,
> pull to the upper level OpExpr's containing variables from the upper
> relation and replace their positions in the quals with true expression.
> Further, the flattening machinery works as usual.

Hmm, I'm not sure this patch works correctly in all cases. It seems to
me this patch pulls up the subquery without checking the constraints
imposed by lateral references. If its quals contain any lateral
references to rels outside a higher outer join, we would need to
postpone quals from below an outer join to above it, which is probably
incorrect. As an example, consider

select * from a left join b on b.i in
(select c.i from c where c.j = a.j);

If we pull up the ANY SubLink into parent query and pull up its qual
into upper level, as what the patch does, then its qual 'c.j = a.j'
would have to be postponed past the B/C semi join, which is totally
wrong. Doing this would firstly trigger the assertion failure in
distribute_qual_to_rels

Assert(root->hasLateralRTEs); /* shouldn't happen otherwise */
Assert(jointype == JOIN_INNER); /* mustn't postpone past outer join */

Even if we ignore these assertion checks, in the final plan we would
have to access the RHS of the B/C semi join, i.e. C, to evaluate qual
'c.j = a.j' at the join level of A/BC join, which is wrong.

Thanks
Richard


From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-09-02 11:08:55
Message-ID: 544c2280-7e02-461a-3d0c-7ae03ba462f2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/1/22 17:24, Richard Guo wrote:
>
> On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
>
> Before flattening procedure we just look through the quals of subquery,
> pull to the upper level OpExpr's containing variables from the upper
> relation and replace their positions in the quals with true expression.
> Further, the flattening machinery works as usual.
>
> Hmm, I'm not sure this patch works correctly in all cases. It seems to
> me this patch pulls up the subquery without checking the constraints
> imposed by lateral references. If its quals contain any lateral
> references to rels outside a higher outer join, we would need to
> postpone quals from below an outer join to above it, which is probably
> incorrect.Yeah, it's not easy-to-solve problem. If I correctly understand the
code, to fix this problem we must implement the same logic, as
pull_up_subqueries (lowest_outer_join/safe_upper_varnos). It looks ugly.
But, more important, does this feature deserve such efforts/changes?

--
Regards
Andrey Lepikhov
Postgres Professional


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-09-05 07:22:36
Message-ID: CAMbWs48HcXguQRc+zqwYT=NBzfMmOk4ZKym6MsrgrTo0fOMuLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> On 9/1/22 17:24, Richard Guo wrote:
> > On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > Before flattening procedure we just look through the quals of
> subquery,
> > pull to the upper level OpExpr's containing variables from the upper
> > relation and replace their positions in the quals with true
> expression.
> > Further, the flattening machinery works as usual.
> >
> > Hmm, I'm not sure this patch works correctly in all cases. It seems to
> > me this patch pulls up the subquery without checking the constraints
> > imposed by lateral references. If its quals contain any lateral
> > references to rels outside a higher outer join, we would need to
> > postpone quals from below an outer join to above it, which is probably
> > incorrect.

> Yeah, it's not easy-to-solve problem. If I correctly understand the
> code, to fix this problem we must implement the same logic, as
> pull_up_subqueries (lowest_outer_join/safe_upper_varnos).

Yeah, I think we'd have to consider the restrictions from lateral
references to guarantee correctness when we pull up subqueries. We need
to avoid the situation where quals need to be postponed past outer join.

However, even if we have taken care of that, there may be other issues
with flattening direct-correlated ANY SubLink. The constraints imposed
by LATERAL references may make it impossible for us to find any legal
join orders, as discussed in [1].

[1]
https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com

Thanks
Richard


From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-09-05 10:54:58
Message-ID: cfd5fe81-c196-15f3-08b4-adb611615fda@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/5/22 12:22, Richard Guo wrote:
>
> On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov
> Yeah, it's not easy-to-solve problem. If I correctly understand the
> code, to fix this problem we must implement the same logic, as
> pull_up_subqueries (lowest_outer_join/safe_upper_varnos).
>
> Yeah, I think we'd have to consider the restrictions from lateral
> references to guarantee correctness when we pull up subqueries. We need
> to avoid the situation where quals need to be postponed past outer join.
>
> However, even if we have taken care of that, there may be other issues
> with flattening direct-correlated ANY SubLink. The constraints imposed
> by LATERAL references may make it impossible for us to find any legal
> join orders, as discussed in [1].
>
> [1]
> https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com <https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com>

The problem you mentioned under this link is about ineffective query
plan - as I understand it.
This is a problem, especially if we would think about more complex
pull-ups of subqueries - with aggregate functions in the target list.
I think about that problem as about next step - we already have an
example - machinery of alternative plans. This problem may be solved in
this way, or by a GUC, as usual.

--
Regards
Andrey Lepikhov
Postgres Professional


From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-09-13 11:40:37
Message-ID: 946e1be9-4733-b72b-4da3-35bb78d14630@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/9/2022 12:22, Richard Guo wrote:
>
> On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > Hmm, I'm not sure this patch works correctly in all cases. It
> seems to
> > me this patch pulls up the subquery without checking the constraints
> > imposed by lateral references. If its quals contain any lateral
> > references to rels outside a higher outer join, we would need to
> > postpone quals from below an outer join to above it, which is
> probably
> > incorrect.
>
> Yeah, it's not easy-to-solve problem. If I correctly understand the
> code, to fix this problem we must implement the same logic, as
> pull_up_subqueries (lowest_outer_join/safe_upper_varnos).
>
> Yeah, I think we'd have to consider the restrictions from lateral
> references to guarantee correctness when we pull up subqueries. We need
> to avoid the situation where quals need to be postponed past outer join.
>
> However, even if we have taken care of that, there may be other issues
> with flattening direct-correlated ANY SubLink. The constraints imposed
> by LATERAL references may make it impossible for us to find any legal
> join orders, as discussed in [1].
To resolve both issues, lower outer join passes through pull_sublinks_*
into flattening routine (see attachment).
I've added these cases into subselect.sql

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
0001-Pass-a-lower-outer-join-through-the-pullup_sublink-r.patch text/plain 15.8 KB

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2022-10-04 05:35:30
Message-ID: 400831f1-b038-6df6-8251-3e6879453462@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/13/22 16:40, Andrey Lepikhov wrote:
> On 5/9/2022 12:22, Richard Guo wrote:
>> On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> To resolve both issues, lower outer join passes through pull_sublinks_*
> into flattening routine (see attachment).
> I've added these cases into subselect.sql
In attachment - new version of the patch, rebased onto current master.

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
v2-0001-Transform-correlated-subquery-of-type-N-J-1-into-ord.patch text/x-patch 50.7 KB

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2024-02-20 09:57:27
Message-ID: 01cc5e78-440c-4e9f-a6fc-a47e1d345ea4@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/9/2022 19:24, Richard Guo wrote:
> Even if we ignore these assertion checks, in the final plan we would
> have to access the RHS of the B/C semi join, i.e. C, to evaluate qual
> 'c.j = a.j' at the join level of A/BC join, which is wrong.
Having committed 9f13376396 recently, we did a lot of work in this area.
By applying regression tests from my last patch [1] to the master, I
compared these two implementations.
As I see, using the LATERAL trick allowed us to simplify the code
drastically. But because we know just a fact of the lateral link, not
its place, in the master we do less when in the patch proposed in that
thread. For example, having query:

explain (costs off)
SELECT relname FROM pg_class c1
WHERE relname = ANY (
SELECT a.amname from pg_am a WHERE a.oid=c1.oid GROUP BY a.amname
);

We see on master:
Nested Loop
-> Seq Scan on pg_class c1
-> Subquery Scan on "ANY_subquery"
Filter: (c1.relname = "ANY_subquery".amname)
-> Group
Group Key: a.amname
-> Sort
Sort Key: a.amname
-> Seq Scan on pg_am a
Filter: (oid = c1.oid)

And with this patch:
Hash Join
Hash Cond: ((c1.relname = a.amname) AND (c1.oid = a.oid))
-> Seq Scan on pg_class c1
-> Hash
-> HashAggregate
Group Key: a.amname
-> Seq Scan on pg_am a

Also, we attempted to fix links from a non-parent query block.
So, in my opinion, the reason for this patch still exists, and we can
continue this work further, maybe elaborating on flattening LATERAL
references - this needs some research.

[1]
https://www.postgresql.org/message-id/35c8a3e8-d080-dfa8-2be3-cf5fe702010a%40postgrespro.ru

--
regards,
Andrei Lepikhov
Postgres Professional


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2024-02-20 10:43:40
Message-ID: CAApHDvrZMEiQCUZkHgorVmxxOmoabCRQRC-BR2Snq0oLP6wyjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 20 Feb 2024 at 22:57, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> explain (costs off)
> SELECT relname FROM pg_class c1
> WHERE relname = ANY (
> SELECT a.amname from pg_am a WHERE a.oid=c1.oid GROUP BY a.amname
> );
>
> We see on master:
> Nested Loop
> -> Seq Scan on pg_class c1
> -> Subquery Scan on "ANY_subquery"
> Filter: (c1.relname = "ANY_subquery".amname)
> -> Group
> Group Key: a.amname
> -> Sort
> Sort Key: a.amname
> -> Seq Scan on pg_am a
> Filter: (oid = c1.oid)
>
> And with this patch:
> Hash Join
> Hash Cond: ((c1.relname = a.amname) AND (c1.oid = a.oid))
> -> Seq Scan on pg_class c1
> -> Hash
> -> HashAggregate
> Group Key: a.amname
> -> Seq Scan on pg_am a

I've only glanced at the patch just so I could determine if you're
making a cost-based decision and doing this transformation only if the
de-correlation of the subquery is deemed the cheaper option. It looks
like since you're doing this in the same location that we do the other
semi / anti join transformations that there's no costing.

I agree that it would be nice to teach the planner how to do this, but
I think it just has to be a cost-based decision. Imagine how the
transformed query would perform of pg_am had a billion rows and
pg_class had 1 row. That's quite a costly hash table build to be
probing it just once.

I didn't follow the patch, but there was a patch to push aggregate
function evaluation down [1]. I imagine this has the same problem as
if you just blindly pushed and aggregate function evaluation as deep
as you could evaluate all the aggregate's parameters and group by vars
then you may end up aggregating far more than you need to as some join
could eliminate the majority of the groups. I think we'd need to come
up with some way to have the planner consider these types of
optimisations as alternatives to what happens today and only apply
them when we estimate that they're cheaper. Right now a Path has no
ability to describe that it's performed GROUP BY.

David

[1] https://commitfest.postgresql.org/46/4019/


From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [POC] Allow flattening of subquery with a link to upper query
Date: 2024-02-20 11:01:21
Message-ID: 168e9426-f7dc-4942-bc99-d5c57e81e4c2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/2/2024 17:43, David Rowley wrote:
> On Tue, 20 Feb 2024 at 22:57, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> I agree that it would be nice to teach the planner how to do this, but
> I think it just has to be a cost-based decision. Imagine how the
> transformed query would perform of pg_am had a billion rows and
> pg_class had 1 row. That's quite a costly hash table build to be
> probing it just once.
True, the origins of this work lie in foreign tables where such a query
generates an even worse situation.

> I didn't follow the patch, but there was a patch to push aggregate
> function evaluation down [1]. I imagine this has the same problem as
> if you just blindly pushed and aggregate function evaluation as deep
> as you could evaluate all the aggregate's parameters and group by vars
> then you may end up aggregating far more than you need to as some join
> could eliminate the majority of the groups. I think we'd need to come
> up with some way to have the planner consider these types of
> optimisations as alternatives to what happens today and only apply
> them when we estimate that they're cheaper. Right now a Path has no
> ability to describe that it's performed GROUP BY.
Thanks for the link. We also ended up with the idea of an alternative
subtree (inspired by the approach of AlternativeSubplan). Here, we just
explain the current state of the pull-up sublink technique.

--
regards,
Andrei Lepikhov
Postgres Professional