Re: immutable functions vs. join for lookups ?

Lists: pgsql-performance
From: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
To: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-18 12:50:46
Message-ID: 6EE64EF3AB31D5448D0007DD34EEB3415C2596@Herge.rcsinc.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> d) self-join with a function ;)
> EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> aaa USING (n);

That's pretty clever.
It sure seems like the server was not caching the results of the
function...maybe the server thought it was to small a table to bother?

Merlin


From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-18 14:19:37
Message-ID: 758d5e7f05041807194a78b55a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 4/18/05, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com> wrote:
> > d) self-join with a function ;)
> > EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> > FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> > aaa USING (n);
>
> That's pretty clever.
> It sure seems like the server was not caching the results of the
> function...maybe the server thought it was to small a table to bother?

Nah, I don't thinks so. Having around 2 097 152 rows of 1s and 0s takes
48 seconds for id2username() query.
The "self join" you've quoted above takes 32 seconds.
SELECT n FROM aaa; takes 7 seconds.

Thinking further...
SELECT CASE n WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END FROM aaa;
takes 9 seconds.

CREATE OR REPLACE FUNCTION id2un_case(oid int) RETURNS text AS $$
BEGIN RETURN CASE oid WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END; END; $$
LANGUAGE plpgsql IMMUTABLE;
SELECT id2un_case(n) FROM aaa;
...takes 36 seconds

...and to see how it depends on flags used:
SELECT count(id2un_case(n)) FROM aaa;
...id2un_case(n) IMMUTABLE takes 29900,114 ms
...id2un_case(n) IMMUTABLE STRICT takes 30187,958 ms
...id2un_case(n) STABLE takes 31457,560 ms
...id2un_case(n) takes 33545,178 ms
...id2un_case(n) VOLATILE takes 35150,920 ms
(and a count(CASE n WHEN ... END) FROM aaa takes: 2564,188 ms

I understand that these measurements are not too accurate. They
were done on idle system, and the queries were run couple of times
(to make sure they're cached :)). I believe either something is minor
performance difference between IMMUTABLE STABLE and even
VOLATILE plpgsql... :(

Oh, and doing things like "ORDER BY n" or "WHERE n = 1" didn't help
either...

I still wonder whether it's only my case or is there really something
wrong with these functions?

Regards,
Dawid


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
Cc: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-18 15:50:49
Message-ID: 22219.1113839449@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com> writes:
>> d) self-join with a function ;)
>> EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
>> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
>> aaa USING (n);

> That's pretty clever.
> It sure seems like the server was not caching the results of the
> function...maybe the server thought it was to small a table to bother?

No, it probably flattened the subquery on sight (looking at the actual
EXPLAIN output would confirm or disprove that). You could prevent the
flattening by adding OFFSET 0 in the subquery. However, the SELECT
DISTINCT sub-sub-query is expensive enough, and the join itself is
expensive enough, that you would need an *enormously* expensive
id2username() function to make this a win.

It would be interesting sometime to try to teach the planner about
inlining SQL-language functions to become joins. That is, given

create function id2name(int) returns text as
'select name from mytab where id = $1' language sql stable;

select uid, id2name(uid) from othertab where something;

I think that in principle this could automatically be converted to

select uid, name from othertab left join mytab on (uid = id) where something;

which is much more amenable to join optimization. There are some
pitfalls though, particularly that you'd have to be able to prove that
the function's query couldn't return more than one row (else the join
might produce more result rows than the original query).

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-18 18:33:15
Message-ID: 87u0m4j5p0.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> It would be interesting sometime to try to teach the planner about
> inlining SQL-language functions to become joins. That is, given
>
> create function id2name(int) returns text as
> 'select name from mytab where id = $1' language sql stable;
>
> select uid, id2name(uid) from othertab where something;
>
> I think that in principle this could automatically be converted to
>
> select uid, name from othertab left join mytab on (uid = id) where something;

The Inlining of the function is presumably a side-issue. I have tons of
queries that use subqueries in the select list for which the same behaviour
would be appropriate.

Things like

select uid, (select name from mytab where id = uid) as name from othertab ...

> There are some pitfalls though, particularly that you'd have to be able to
> prove that the function's query couldn't return more than one row (else the
> join might produce more result rows than the original query).

Or just have a special join type that has the desired behaviour in that case.
Ie, pretend the query was really

SELECT * FROM othertab LEFT SINGLE JOIN mytab ...

Where "LEFT SINGLE JOIN" is an imaginary syntax that doesn't actually have to
exist in the parser, but exists in the planner/executor and behaves
differently in the case of duplicate matches.

Actually I could see such a syntax being useful directly too.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-18 19:50:26
Message-ID: 25151.1113853826@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> It would be interesting sometime to try to teach the planner about
>> inlining SQL-language functions to become joins. That is, given

> The Inlining of the function is presumably a side-issue. I have tons of
> queries that use subqueries in the select list for which the same behaviour
> would be appropriate.

Yeah, I was actually thinking about a two-step process: inline the
function to produce somethig equivalent to a handwritten scalar
sub-SELECT, and then try to convert sub-SELECTs into joins.

regards, tom lane


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-20 00:30:54
Message-ID: 20050420003054.GV58835@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

You should re-run the function test using SQL as the function language
instead of plpgsql. There might be some performance to be had there.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-20 08:35:48
Message-ID: 758d5e7f0504200135367d56d5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 4/20/05, Jim C. Nasby <decibel(at)decibel(dot)org> wrote:
> You should re-run the function test using SQL as the function language
> instead of plpgsql. There might be some performance to be had there.

Yay! You're right! I wonder why have I forgotten about LANGUAGE SQL. :)
It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
LANGUAGE SQL. :) I.e. its almost the same as in-place entered SQL.

Regards,
Dawid


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-20 08:52:48
Message-ID: 42661860.6040105@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Yay! You're right! I wonder why have I forgotten about LANGUAGE SQL. :)
> It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
> LANGUAGE SQL. :) I.e. its almost the same as in-place entered SQL.
>
> Regards,
> Dawid
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-20 08:53:12
Message-ID: 42661878.6010802@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Yay! You're right! I wonder why have I forgotten about LANGUAGE SQL. :)
> It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
> LANGUAGE SQL. :) I.e. its almost the same as in-place entered SQL.

Probably because simple SQL functions get inlined by the optimiser.

Chris


From: Enrico Weigelt <weigelt(at)metux(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-21 19:23:58
Message-ID: 20050421192357.GC16548@nibiru.borg.metux.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

<snip>
> Yeah, I was actually thinking about a two-step process: inline the
> function to produce somethig equivalent to a handwritten scalar
> sub-SELECT, and then try to convert sub-SELECTs into joins.

... back to my original question ...

What kind of query should I use ?
Is a join better than a function ?

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service

phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: contact(at)metux(dot)de
cellphone: +49 174 7066481
---------------------------------------------------------------------
-- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------


From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: weigelt(at)metux(dot)de, pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-22 10:08:50
Message-ID: 758d5e7f0504220308453c0dbc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 4/21/05, Enrico Weigelt <weigelt(at)metux(dot)de> wrote:
> * Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> <snip>
> > Yeah, I was actually thinking about a two-step process: inline the
> > function to produce somethig equivalent to a handwritten scalar
> > sub-SELECT, and then try to convert sub-SELECTs into joins.
>
> ... back to my original question ...
>
> What kind of query should I use ?
> Is a join better than a function ?

It appears that JOINs are usually faster. So if performance is an
important issue, go with JOIN (and VIEWs probably). Functions are nicer
(in terms off look and feel).

Regards,
Dawid