Re: Possible to improve query plan?

Lists: pgsql-performance
From: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Possible to improve query plan?
Date: 2011-01-17 03:21:56
Message-ID: 666FB8D75E95AE42965A0E76A5E5337E06D2C683BF@prdlsmmsg01.ad.linz.govt.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all,

I've come to a dead end in trying to get a commonly used query to perform better. The query is against one table with 10 million rows. This table has been analysed. The table definition is:

CREATE TABLE version_crs_coordinate_revision
(
_revision_created integer NOT NULL,
_revision_expired integer,
id integer NOT NULL,
cos_id integer NOT NULL,
nod_id integer NOT NULL,
ort_type_1 character varying(4),
ort_type_2 character varying(4),
ort_type_3 character varying(4),
status character varying(4) NOT NULL,
sdc_status character(1) NOT NULL,
source character varying(4),
value1 numeric(22,12),
value2 numeric(22,12),
value3 numeric(22,12),
wrk_id_created integer,
cor_id integer,
audit_id integer NOT NULL,
CONSTRAINT pkey_version_crs_coordinate_revision PRIMARY KEY (_revision_created, id),
CONSTRAINT version_crs_coordinate_revision_revision_created_fkey FOREIGN KEY (_revision_created)
REFERENCES revision (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT version_crs_coordinate_revision_revision_expired_fkey FOREIGN KEY (_revision_expired)
REFERENCES revision (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
ALTER TABLE "version".version_crs_coordinate_revision ALTER COLUMN _revision_created SET STATISTICS 1000;
ALTER TABLE "version".version_crs_coordinate_revision ALTER COLUMN _revision_expired SET STATISTICS 1000;
ALTER TABLE "version".version_crs_coordinate_revision ALTER COLUMN id SET STATISTICS 1000;

CREATE INDEX idx_crs_coordinate_revision_created ON "version".version_crs_coordinate_revision USING btree (_revision_created);
CREATE INDEX idx_crs_coordinate_revision_created_expired ON "version".version_crs_coordinate_revision USING btree (_revision_created, _revision_expired);
CREATE INDEX idx_crs_coordinate_revision_expired ON "version".version_crs_coordinate_revision USING btree (_revision_expired);
CREATE INDEX idx_crs_coordinate_revision_expired_created ON "version".version_crs_coordinate_revision USING btree (_revision_expired, _revision_created);
CREATE INDEX idx_crs_coordinate_revision_expired_id ON "version".version_crs_coordinate_revision USING btree (_revision_expired, id);
CREATE INDEX idx_crs_coordinate_revision_id ON "version".version_crs_coordinate_revision USING btree (id);
CREATE INDEX idx_crs_coordinate_revision_id_created ON "version".version_crs_coordinate_revision USING btree (id, _revision_created);

The distribution of the data is that all but 120,000 rows have null values in the _revision_expired column.

The query itself that I'm trying to optimise is below:

EXPLAIN
SELECT * FROM (
SELECT
row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) as row_number,
*
FROM
version_crs_coordinate_revision
WHERE (
(_revision_created <= 16 AND _revision_expired > 16 AND _revision_expired <= 40) OR
(_revision_created > 16 AND _revision_created <= 40)
)
) AS T
WHERE row_number = 1;

Subquery Scan t (cost=170692.25..175678.27 rows=767 width=205)
Filter: (t.row_number = 1)
-> WindowAgg (cost=170692.25..173760.57 rows=153416 width=86)
-> Sort (cost=170692.25..171075.79 rows=153416 width=86)
Sort Key: version_crs_coordinate_revision.id, version_crs_coordinate_revision._revision_created
-> Bitmap Heap Scan on version_crs_coordinate_revision (cost=3319.13..157477.69 rows=153416 width=86)
Recheck Cond: (((_revision_expired > 16) AND (_revision_expired <= 40)) OR ((_revision_created > 16) AND (_revision_created <= 40)))
Filter: (((_revision_created <= 16) AND (_revision_expired > 16) AND (_revision_expired <= 40)) OR ((_revision_created > 16) AND (_revision_created <= 40)))
-> BitmapOr (cost=3319.13..3319.13 rows=154372 width=0)
-> Bitmap Index Scan on idx_crs_coordinate_revision_expired (cost=0.00..2331.76 rows=111041 width=0)
Index Cond: ((_revision_expired > 16) AND (_revision_expired <= 40))
-> Bitmap Index Scan on idx_crs_coordinate_revision_created (cost=0.00..910.66 rows=43331 width=0)
Index Cond: ((_revision_created > 16) AND (_revision_created <= 40))

One thought I have is that maybe the idx_crs_coordinate_revision_expired_created index could be used instead of idx_crs_coordinate_revision_expired.

Does anyone have any suggestions what I could do to improve the plan? Or how I could force the use of the idx_crs_coordinate_revision_expired_created index to see if that is better.

Thanks
Jeremy

______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info(at)linz(dot)govt(dot)nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 04:22:20
Message-ID: 4D33C3FC.8090405@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 01/16/2011 09:21 PM, Jeremy Palmer wrote:
> Hi all,
>
> I've come to a dead end in trying to get a commonly used query to perform better. The query is against one table with 10 million rows. This table has been analysed. The table definition is:
>
> CREATE TABLE version_crs_coordinate_revision
> (
> _revision_created integer NOT NULL,
> _revision_expired integer,
> id integer NOT NULL,
> cos_id integer NOT NULL,
> nod_id integer NOT NULL,
> ort_type_1 character varying(4),
> ort_type_2 character varying(4),
> ort_type_3 character varying(4),
> status character varying(4) NOT NULL,
> sdc_status character(1) NOT NULL,
> source character varying(4),
> value1 numeric(22,12),
> value2 numeric(22,12),
> value3 numeric(22,12),
> wrk_id_created integer,
> cor_id integer,
> audit_id integer NOT NULL,
> CONSTRAINT pkey_version_crs_coordinate_revision PRIMARY KEY (_revision_created, id),
> CONSTRAINT version_crs_coordinate_revision_revision_created_fkey FOREIGN KEY (_revision_created)
> REFERENCES revision (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT version_crs_coordinate_revision_revision_expired_fkey FOREIGN KEY (_revision_expired)
> REFERENCES revision (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
> ALTER TABLE "version".version_crs_coordinate_revision ALTER COLUMN _revision_created SET STATISTICS 1000;
> ALTER TABLE "version".version_crs_coordinate_revision ALTER COLUMN _revision_expired SET STATISTICS 1000;
> ALTER TABLE "version".version_crs_coordinate_revision ALTER COLUMN id SET STATISTICS 1000;
>
> CREATE INDEX idx_crs_coordinate_revision_created ON "version".version_crs_coordinate_revision USING btree (_revision_created);
> CREATE INDEX idx_crs_coordinate_revision_created_expired ON "version".version_crs_coordinate_revision USING btree (_revision_created, _revision_expired);
> CREATE INDEX idx_crs_coordinate_revision_expired ON "version".version_crs_coordinate_revision USING btree (_revision_expired);
> CREATE INDEX idx_crs_coordinate_revision_expired_created ON "version".version_crs_coordinate_revision USING btree (_revision_expired, _revision_created);
> CREATE INDEX idx_crs_coordinate_revision_expired_id ON "version".version_crs_coordinate_revision USING btree (_revision_expired, id);
> CREATE INDEX idx_crs_coordinate_revision_id ON "version".version_crs_coordinate_revision USING btree (id);
> CREATE INDEX idx_crs_coordinate_revision_id_created ON "version".version_crs_coordinate_revision USING btree (id, _revision_created);
>
>
> The distribution of the data is that all but 120,000 rows have null values in the _revision_expired column.
>
> The query itself that I'm trying to optimise is below:
>
> EXPLAIN
> SELECT * FROM (
> SELECT
> row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) as row_number,
> *
> FROM
> version_crs_coordinate_revision
> WHERE (
> (_revision_created<= 16 AND _revision_expired> 16 AND _revision_expired<= 40) OR
> (_revision_created> 16 AND _revision_created<= 40)
> )
> ) AS T
> WHERE row_number = 1;
>
> Subquery Scan t (cost=170692.25..175678.27 rows=767 width=205)
> Filter: (t.row_number = 1)
> -> WindowAgg (cost=170692.25..173760.57 rows=153416 width=86)
> -> Sort (cost=170692.25..171075.79 rows=153416 width=86)
> Sort Key: version_crs_coordinate_revision.id, version_crs_coordinate_revision._revision_created
> -> Bitmap Heap Scan on version_crs_coordinate_revision (cost=3319.13..157477.69 rows=153416 width=86)
> Recheck Cond: (((_revision_expired> 16) AND (_revision_expired<= 40)) OR ((_revision_created> 16) AND (_revision_created<= 40)))
> Filter: (((_revision_created<= 16) AND (_revision_expired> 16) AND (_revision_expired<= 40)) OR ((_revision_created> 16) AND (_revision_created<= 40)))
> -> BitmapOr (cost=3319.13..3319.13 rows=154372 width=0)
> -> Bitmap Index Scan on idx_crs_coordinate_revision_expired (cost=0.00..2331.76 rows=111041 width=0)
> Index Cond: ((_revision_expired> 16) AND (_revision_expired<= 40))
> -> Bitmap Index Scan on idx_crs_coordinate_revision_created (cost=0.00..910.66 rows=43331 width=0)
> Index Cond: ((_revision_created> 16) AND (_revision_created<= 40))
>
>
> One thought I have is that maybe the idx_crs_coordinate_revision_expired_created index could be used instead of idx_crs_coordinate_revision_expired.
>
> Does anyone have any suggestions what I could do to improve the plan? Or how I could force the use of the idx_crs_coordinate_revision_expired_created index to see if that is better.
>
> Thanks
> Jeremy

First, wow, those are long names... I had a hard time keeping track.

Second: you have lots of duplicated indexes. I count _revision_created in 4 indexes? Not sure what other sql you are using, but have you tried one index for one column? PG will be able to Bitmap them together if it thinks it can use more than one. Was that because you were testing?

Third: any chance we can get an "explain analyze"? It give's more info. (Also, have you seen http://explain.depesz.com/)

Last: If you wanted to force the index usage, for a test, you could drop the other indexes. I assume this is on a test box so it should be ok. If its live, you could wrap it in a BEGIN ... ROLLBACK (in theory... never tried it myself)

-Andy


From: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Possible to improve query plan?
Date: 2011-01-17 04:43:51
Message-ID: 666FB8D75E95AE42965A0E76A5E5337E06D2C6847C@prdlsmmsg01.ad.linz.govt.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Andy,

Yeah sorry about the long name, there are all generated by function as part of a table versioning system. And yes I placed all possible indexes on the table to see which would be used by the planner. In production I will drop the unused indexes.

Yes simple drop the extra index :P I have dropped the index and it made the query slower :(

Here is the explain analyse:

Subquery Scan t (cost=170692.25..175678.27 rows=767 width=205) (actual time=13762.783..14322.315 rows=106299 loops=1)'
Filter: (t.row_number = 1)'
-> WindowAgg (cost=170692.25..173760.57 rows=153416 width=86) (actual time=13762.774..14208.522 rows=149557 loops=1)'
-> Sort (cost=170692.25..171075.79 rows=153416 width=86) (actual time=13762.745..13828.584 rows=149557 loops=1)'
Sort Key: version_crs_coordinate_revision.id, version_crs_coordinate_revision._revision_created'
Sort Method: quicksort Memory: 23960kB
-> Bitmap Heap Scan on version_crs_coordinate_revision (cost=3319.13..157477.69 rows=153416 width=86) (actual time=70.925..13531.720 rows=149557 loops=1)
Recheck Cond: (((_revision_expired > 16) AND (_revision_expired <= 40)) OR ((_revision_created > 16) AND (_revision_created <= 40)))
Filter: (((_revision_created <= 16) AND (_revision_expired > 16) AND (_revision_expired <= 40)) OR ((_revision_created > 16) AND (_revision_created <= 40)))
-> BitmapOr (cost=3319.13..3319.13 rows=154372 width=0) (actual time=53.650..53.650 rows=0 loops=1)
-> Bitmap Index Scan on idx_crs_coordinate_revision_expired (cost=0.00..2331.76 rows=111041 width=0) (actual time=37.773..37.773 rows=110326 loops=1)
Index Cond: ((_revision_expired > 16) AND (_revision_expired <= 40))
-> Bitmap Index Scan on idx_crs_coordinate_revision_created (cost=0.00..910.66 rows=43331 width=0) (actual time=15.872..15.872 rows=43258 loops=1)
Index Cond: ((_revision_created > 16) AND (_revision_created <= 40))
Total runtime: 14359.747 ms

http://explain.depesz.com/s/qpL says that the bitmap heap scan is bad. Not sure what to do about it.

Thanks,
Jeremy

-----Original Message-----
From: Andy Colson [mailto:andy(at)squeakycode(dot)net]
Sent: Monday, 17 January 2011 5:22 p.m.
To: Jeremy Palmer
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Possible to improve query plan?

First, wow, those are long names... I had a hard time keeping track.

Second: you have lots of duplicated indexes. I count _revision_created in 4 indexes? Not sure what other sql you are using, but have you tried one index for one column? PG will be able to Bitmap them together if it thinks it can use more than one. Was that because you were testing?

Third: any chance we can get an "explain analyze"? It give's more info. (Also, have you seen http://explain.depesz.com/)

Last: If you wanted to force the index usage, for a test, you could drop the other indexes. I assume this is on a test box so it should be ok. If its live, you could wrap it in a BEGIN ... ROLLBACK (in theory... never tried it myself)

-Andy
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info(at)linz(dot)govt(dot)nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 04:57:29
Message-ID: 4D33CC39.1020600@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: Andy Colson [mailto:andy(at)squeakycode(dot)net]
> Sent: Monday, 17 January 2011 5:22 p.m.
> To: Jeremy Palmer
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] Possible to improve query plan?
>
>
> First, wow, those are long names... I had a hard time keeping track.
>
> Second: you have lots of duplicated indexes. I count _revision_created in 4 indexes? Not sure what other sql you are using, but have you tried one index for one column? PG will be able to Bitmap them together if it thinks it can use more than one. Was that because you were testing?
>
> Third: any chance we can get an "explain analyze"? It give's more info. (Also, have you seen http://explain.depesz.com/)
>
> Last: If you wanted to force the index usage, for a test, you could drop the other indexes. I assume this is on a test box so it should be ok. If its live, you could wrap it in a BEGIN ... ROLLBACK (in theory... never tried it myself)
>
> -Andy

On 01/16/2011 10:43 PM, Jeremy Palmer wrote:
> Hi Andy,
>
> Yeah sorry about the long name, there are all generated by function as part of a table versioning system. And yes I placed all possible indexes on the table to see which would be used by the planner. In production I will drop the unused indexes.
>
> Yes simple drop the extra index :P I have dropped the index and it made the query slower :(
>
> Here is the explain analyse:
>
> Subquery Scan t (cost=170692.25..175678.27 rows=767 width=205) (actual time=13762.783..14322.315 rows=106299 loops=1)'
> Filter: (t.row_number = 1)'
> -> WindowAgg (cost=170692.25..173760.57 rows=153416 width=86) (actual time=13762.774..14208.522 rows=149557 loops=1)'
> -> Sort (cost=170692.25..171075.79 rows=153416 width=86) (actual time=13762.745..13828.584 rows=149557 loops=1)'
> Sort Key: version_crs_coordinate_revision.id, version_crs_coordinate_revision._revision_created'
> Sort Method: quicksort Memory: 23960kB
> -> Bitmap Heap Scan on version_crs_coordinate_revision (cost=3319.13..157477.69 rows=153416 width=86) (actual time=70.925..13531.720 rows=149557 loops=1)
> Recheck Cond: (((_revision_expired> 16) AND (_revision_expired<= 40)) OR ((_revision_created> 16) AND (_revision_created<= 40)))
> Filter: (((_revision_created<= 16) AND (_revision_expired> 16) AND (_revision_expired<= 40)) OR ((_revision_created> 16) AND (_revision_created<= 40)))
> -> BitmapOr (cost=3319.13..3319.13 rows=154372 width=0) (actual time=53.650..53.650 rows=0 loops=1)
> -> Bitmap Index Scan on idx_crs_coordinate_revision_expired (cost=0.00..2331.76 rows=111041 width=0) (actual time=37.773..37.773 rows=110326 loops=1)
> Index Cond: ((_revision_expired> 16) AND (_revision_expired<= 40))
> -> Bitmap Index Scan on idx_crs_coordinate_revision_created (cost=0.00..910.66 rows=43331 width=0) (actual time=15.872..15.872 rows=43258 loops=1)
> Index Cond: ((_revision_created> 16) AND (_revision_created<= 40))
> Total runtime: 14359.747 ms
>
> http://explain.depesz.com/s/qpL says that the bitmap heap scan is bad. Not sure what to do about it.
>
> Thanks,
> Jeremy
>

Hum.. yeah it looks like it takes no time at all to pull data from the individual indexes, and them bitmap them. I'm not sure what the bitmap heap scan is, or why its slow. Hopefully someone smarter will come along.

Also its weird that explain.depesz.com didnt parse and show your entire plan. Hum.. you seem to have ending quotes on some of the lines?

One other though: quicksort Memory: 23960kB
It needs 20Meg to sort... It could be your sort is swapping to disk.

What sort of PG version is this?
What are you using for work_mem? (you could try to bump it up a little (its possible to set for session only, no need for server restart) and see if that'd help.

And sorry, but its my bedtime, good luck though.

-Andy


From: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 05:13:25
Message-ID: 666FB8D75E95AE42965A0E76A5E5337E06D2C68482@prdlsmmsg01.ad.linz.govt.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Andy,

Yes important omissions:

Server version: 8.4.6
OS Windows Server 2003 Standard Ed :(
The work mem is 50mb.

I tried setting the work_mem to 500mb, but it didn't make a huge difference in query execution time. But then again the OS disk caching is probably taking over here.

Ok here's the new plan with work_mem = 50mb:

http://explain.depesz.com/s/xwv

And here another plan with work_mem = 500mb:

http://explain.depesz.com/s/VmO

Thanks,
Jeremy

-----Original Message-----
From: Andy Colson [mailto:andy(at)squeakycode(dot)net]
Sent: Monday, 17 January 2011 5:57 p.m.
To: Jeremy Palmer
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Possible to improve query plan?

Hum.. yeah it looks like it takes no time at all to pull data from the individual indexes, and them bitmap them. I'm not sure what the bitmap heap scan is, or why its slow. Hopefully someone smarter will come along.

Also its weird that explain.depesz.com didnt parse and show your entire plan. Hum.. you seem to have ending quotes on some of the lines?

One other though: quicksort Memory: 23960kB
It needs 20Meg to sort... It could be your sort is swapping to disk.

What sort of PG version is this?
What are you using for work_mem? (you could try to bump it up a little (its possible to set for session only, no need for server restart) and see if that'd help.

And sorry, but its my bedtime, good luck though.

-Andy

______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info(at)linz(dot)govt(dot)nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________


From: Jayadevan M <Jayadevan(dot)Maymala(at)ibsplc(dot)com>
To: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, pgsql-performance-owner(at)postgresql(dot)org
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 08:44:05
Message-ID: OFCF071F4A.48152E2C-ON6525781B.002EEEE4-6525781B.002FFBBE@ibsplc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello,

>
> The distribution of the data is that all but 120,000 rows have null
> values in the _revision_expired column.
>

A shot in the dark - will a partial index on the above column help?
http://www.postgresql.org/docs/current/interactive/indexes-partial.html
http://en.wikipedia.org/wiki/Partial_index

One link with discussion about it...
http://www.devheads.net/database/postgresql/general/when-can-postgresql-use-partial-not-null-index-seems-depend-size-clause-even-enable-seqscan.htm

Regards,
Jayadevan

DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 20:24:26
Message-ID: 20065.1295295866@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz> writes:
> I've come to a dead end in trying to get a commonly used query to
> perform better.

> EXPLAIN
> SELECT * FROM (
> SELECT
> row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) as row_number,
> *
> FROM
> version_crs_coordinate_revision
> WHERE (
> (_revision_created <= 16 AND _revision_expired > 16 AND _revision_expired <= 40) OR
> (_revision_created > 16 AND _revision_created <= 40)
> )
> ) AS T
> WHERE row_number = 1;

If I'm not mistaken, that's a DB2-ish locution for a query with DISTINCT
ON, ie, you're looking for the row with highest _revision_created for
each value of id. It might perform well on DB2, but it's going to
mostly suck on Postgres --- we don't optimize window-function queries
very much at all at the moment. Try writing it with DISTINCT ON instead
of a window function, like so:

SELECT DISTINCT ON (id)
*
FROM
version_crs_coordinate_revision
WHERE (
(_revision_created <= 16 AND _revision_expired > 16 AND _revision_expired <= 40) OR
(_revision_created > 16 AND _revision_created <= 40)
)
ORDER BY id, _revision_created DESC;

You could also experiment with various forms of GROUP BY if you're loath
to use any Postgres-specific syntax.

regards, tom lane


From: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 21:01:17
Message-ID: 666FB8D75E95AE42965A0E76A5E5337E06D2C686B5@prdlsmmsg01.ad.linz.govt.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks that seems to make the query 10-15% faster :)

Cheers
jeremy

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, 18 January 2011 9:24 a.m.
To: Jeremy Palmer
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Possible to improve query plan?

Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz> writes:
> I've come to a dead end in trying to get a commonly used query to
> perform better.

> EXPLAIN
> SELECT * FROM (
> SELECT
> row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) as row_number,
> *
> FROM
> version_crs_coordinate_revision
> WHERE (
> (_revision_created <= 16 AND _revision_expired > 16 AND _revision_expired <= 40) OR
> (_revision_created > 16 AND _revision_created <= 40)
> )
> ) AS T
> WHERE row_number = 1;

If I'm not mistaken, that's a DB2-ish locution for a query with DISTINCT
ON, ie, you're looking for the row with highest _revision_created for
each value of id. It might perform well on DB2, but it's going to
mostly suck on Postgres --- we don't optimize window-function queries
very much at all at the moment. Try writing it with DISTINCT ON instead
of a window function, like so:

SELECT DISTINCT ON (id)
*
FROM
version_crs_coordinate_revision
WHERE (
(_revision_created <= 16 AND _revision_expired > 16 AND _revision_expired <= 40) OR
(_revision_created > 16 AND _revision_created <= 40)
)
ORDER BY id, _revision_created DESC;

You could also experiment with various forms of GROUP BY if you're loath
to use any Postgres-specific syntax.

regards, tom lane
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info(at)linz(dot)govt(dot)nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________


From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Palmer <JPalmer(at)linz(dot)govt(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible to improve query plan?
Date: 2011-01-17 22:11:09
Message-ID: 4D34BE7D.3070007@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> If I'm not mistaken, that's a DB2-ish locution

It could also be a part of the Oracle vernacular. I've seen queries like
that running against Oracle RDBMS, too.

> for a query with DISTINCT
> ON, ie, you're looking for the row with highest _revision_created for
> each value of id. It might perform well on DB2, but it's going to
> mostly suck on Postgres --- we don't optimize window-function queries
> very much at all at the moment.
Hmmm, what optimizations do you have in mind? I thought that window
functions are just clever tricks with memory? Anything that can be
expected for 9.0x?

> Try writing it with DISTINCT ON instead
> of a window function, like so:
>
Wouldn't "distinct" necessarily bring about the sort/merge?

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions