Lists: | pgsql-performance |
---|
From: | "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com> |
---|---|
To: | "Dawid Kuroczko" <qnex42(at)gmail(dot)com> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-18 12:50:46 |
Message-ID: | 6EE64EF3AB31D5448D0007DD34EEB3415C2596@Herge.rcsinc.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
> d) self-join with a function ;)
> EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> aaa USING (n);
That's pretty clever.
It sure seems like the server was not caching the results of the
function...maybe the server thought it was to small a table to bother?
Merlin
From: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-18 14:19:37 |
Message-ID: | 758d5e7f05041807194a78b55a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On 4/18/05, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com> wrote:
> > d) self-join with a function ;)
> > EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> > FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> > aaa USING (n);
>
> That's pretty clever.
> It sure seems like the server was not caching the results of the
> function...maybe the server thought it was to small a table to bother?
Nah, I don't thinks so. Having around 2 097 152 rows of 1s and 0s takes
48 seconds for id2username() query.
The "self join" you've quoted above takes 32 seconds.
SELECT n FROM aaa; takes 7 seconds.
Thinking further...
SELECT CASE n WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END FROM aaa;
takes 9 seconds.
CREATE OR REPLACE FUNCTION id2un_case(oid int) RETURNS text AS $$
BEGIN RETURN CASE oid WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END; END; $$
LANGUAGE plpgsql IMMUTABLE;
SELECT id2un_case(n) FROM aaa;
...takes 36 seconds
...and to see how it depends on flags used:
SELECT count(id2un_case(n)) FROM aaa;
...id2un_case(n) IMMUTABLE takes 29900,114 ms
...id2un_case(n) IMMUTABLE STRICT takes 30187,958 ms
...id2un_case(n) STABLE takes 31457,560 ms
...id2un_case(n) takes 33545,178 ms
...id2un_case(n) VOLATILE takes 35150,920 ms
(and a count(CASE n WHEN ... END) FROM aaa takes: 2564,188 ms
I understand that these measurements are not too accurate. They
were done on idle system, and the queries were run couple of times
(to make sure they're cached :)). I believe either something is minor
performance difference between IMMUTABLE STABLE and even
VOLATILE plpgsql... :(
Oh, and doing things like "ORDER BY n" or "WHERE n = 1" didn't help
either...
I still wonder whether it's only my case or is there really something
wrong with these functions?
Regards,
Dawid
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com> |
Cc: | "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-18 15:50:49 |
Message-ID: | 22219.1113839449@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
"Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com> writes:
>> d) self-join with a function ;)
>> EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
>> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
>> aaa USING (n);
> That's pretty clever.
> It sure seems like the server was not caching the results of the
> function...maybe the server thought it was to small a table to bother?
No, it probably flattened the subquery on sight (looking at the actual
EXPLAIN output would confirm or disprove that). You could prevent the
flattening by adding OFFSET 0 in the subquery. However, the SELECT
DISTINCT sub-sub-query is expensive enough, and the join itself is
expensive enough, that you would need an *enormously* expensive
id2username() function to make this a win.
It would be interesting sometime to try to teach the planner about
inlining SQL-language functions to become joins. That is, given
create function id2name(int) returns text as
'select name from mytab where id = $1' language sql stable;
select uid, id2name(uid) from othertab where something;
I think that in principle this could automatically be converted to
select uid, name from othertab left join mytab on (uid = id) where something;
which is much more amenable to join optimization. There are some
pitfalls though, particularly that you'd have to be able to prove that
the function's query couldn't return more than one row (else the join
might produce more result rows than the original query).
regards, tom lane
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-18 18:33:15 |
Message-ID: | 87u0m4j5p0.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> It would be interesting sometime to try to teach the planner about
> inlining SQL-language functions to become joins. That is, given
>
> create function id2name(int) returns text as
> 'select name from mytab where id = $1' language sql stable;
>
> select uid, id2name(uid) from othertab where something;
>
> I think that in principle this could automatically be converted to
>
> select uid, name from othertab left join mytab on (uid = id) where something;
The Inlining of the function is presumably a side-issue. I have tons of
queries that use subqueries in the select list for which the same behaviour
would be appropriate.
Things like
select uid, (select name from mytab where id = uid) as name from othertab ...
> There are some pitfalls though, particularly that you'd have to be able to
> prove that the function's query couldn't return more than one row (else the
> join might produce more result rows than the original query).
Or just have a special join type that has the desired behaviour in that case.
Ie, pretend the query was really
SELECT * FROM othertab LEFT SINGLE JOIN mytab ...
Where "LEFT SINGLE JOIN" is an imaginary syntax that doesn't actually have to
exist in the parser, but exists in the planner/executor and behaves
differently in the case of duplicate matches.
Actually I could see such a syntax being useful directly too.
--
greg
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-18 19:50:26 |
Message-ID: | 25151.1113853826@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> It would be interesting sometime to try to teach the planner about
>> inlining SQL-language functions to become joins. That is, given
> The Inlining of the function is presumably a side-issue. I have tons of
> queries that use subqueries in the select list for which the same behaviour
> would be appropriate.
Yeah, I was actually thinking about a two-step process: inline the
function to produce somethig equivalent to a handwritten scalar
sub-SELECT, and then try to convert sub-SELECTs into joins.
regards, tom lane
From: | "Jim C(dot) Nasby" <decibel(at)decibel(dot)org> |
---|---|
To: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-20 00:30:54 |
Message-ID: | 20050420003054.GV58835@decibel.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
You should re-run the function test using SQL as the function language
instead of plpgsql. There might be some performance to be had there.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
From: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-20 08:35:48 |
Message-ID: | 758d5e7f0504200135367d56d5@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On 4/20/05, Jim C. Nasby <decibel(at)decibel(dot)org> wrote:
> You should re-run the function test using SQL as the function language
> instead of plpgsql. There might be some performance to be had there.
Yay! You're right! I wonder why have I forgotten about LANGUAGE SQL. :)
It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
LANGUAGE SQL. :) I.e. its almost the same as in-place entered SQL.
Regards,
Dawid
From: | Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> |
---|---|
To: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-20 08:52:48 |
Message-ID: | 42661860.6040105@familyhealth.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
> Yay! You're right! I wonder why have I forgotten about LANGUAGE SQL. :)
> It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
> LANGUAGE SQL. :) I.e. its almost the same as in-place entered SQL.
>
> Regards,
> Dawid
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
From: | Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> |
---|---|
To: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-20 08:53:12 |
Message-ID: | 42661878.6010802@familyhealth.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
> Yay! You're right! I wonder why have I forgotten about LANGUAGE SQL. :)
> It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
> LANGUAGE SQL. :) I.e. its almost the same as in-place entered SQL.
Probably because simple SQL functions get inlined by the optimiser.
Chris
From: | Enrico Weigelt <weigelt(at)metux(dot)de> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-21 19:23:58 |
Message-ID: | 20050421192357.GC16548@nibiru.borg.metux.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
<snip>
> Yeah, I was actually thinking about a two-step process: inline the
> function to produce somethig equivalent to a handwritten scalar
> sub-SELECT, and then try to convert sub-SELECTs into joins.
... back to my original question ...
What kind of query should I use ?
Is a join better than a function ?
cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service
phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: contact(at)metux(dot)de
cellphone: +49 174 7066481
---------------------------------------------------------------------
-- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------
From: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
---|---|
To: | weigelt(at)metux(dot)de, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: immutable functions vs. join for lookups ? |
Date: | 2005-04-22 10:08:50 |
Message-ID: | 758d5e7f0504220308453c0dbc@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On 4/21/05, Enrico Weigelt <weigelt(at)metux(dot)de> wrote:
> * Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> <snip>
> > Yeah, I was actually thinking about a two-step process: inline the
> > function to produce somethig equivalent to a handwritten scalar
> > sub-SELECT, and then try to convert sub-SELECTs into joins.
>
> ... back to my original question ...
>
> What kind of query should I use ?
> Is a join better than a function ?
It appears that JOINs are usually faster. So if performance is an
important issue, go with JOIN (and VIEWs probably). Functions are nicer
(in terms off look and feel).
Regards,
Dawid