Re: how to change the index chosen in plan?

Lists: pgsql-performance
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <ruralhunter(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>,<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: how to change the index chosen in plan?
Date: 2012-06-09 14:39:06
Message-ID: 4FD319BA0200002500048250@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Rural Hunter wrote:
> 于 2012/6/9 0:39, Kevin Grittner 写道:

> name | current_setting

> full_page_writes | off

There may be exceptions on some file systems, but generally turning
this off leaves you vulnerable to possible database corruption if you
OS or hardware crashes.

> max_connections | 2500

Yikes! You may want to look in to a connection pooler which can take
2500 client connections and funnel them into a much smaller number of
database connections.

https://wiki.postgresql.org/wiki/Number_Of_Database_Connections

> shared_buffers | 60GB

You might want to compare your performance with this setting against
a smaller setting. Many benchmarks have shown settings about a
certain point (like 8MB to 12 MB) to be counter-productive, although
a few have shown increased performance going past that. It really
seems to depend on your hardware and workload, so you have to test to
find the "sweet spot" for your environment.

> work_mem | 8MB

With so many connections, I can understand being this low. One of
the advantages of using connection pooling to funnel your user
connections into fewer database conncections is that you can boost
this, which might help considerably with some types of queries.

None of the above, however, really gets to your immediate problem.
What is most significant about your settings with regard to the
problem query is what's *not* in that list. You appear to have a
heavily cached active data set, based on the row counts and timings
in EXPLAIN ANALYZE output, and you have not adjusted your cost
factors, which assume less caching.

Try setting these on a connection and then running your queries on
that connection.

set seq_page_cost = 0.1;
set random_page_cost = 0.1;
set cpu_tuple_cost = 0.03;

> Ok, I get out a simple version of the actualy query. Here is the
> explain anaylze without order-by, which is I wanted:
> http://explain.depesz.com/s/p1p
>
> Another with the order-by which I want to avoid:
> http://explain.depesz.com/s/ujU

You neglected to mention the LIMIT clause in your earlier
presentation of the problem. A LIMIT can have a big impact on plan
choice. Is the LIMIT 10 part of the actual query you want to
optimize? Either way it would be helpful to see the EXPLAIN ANALYZE
output for the the query without the LIMIT clause.

-Kevin


From: Rural Hunter <ruralhunter(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: how to change the index chosen in plan?
Date: 2012-06-11 04:46:41
Message-ID: 4FD57831.1070602@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

于 2012/6/9 22:39, Kevin Grittner 写道:
> Rural Hunter wrote:
>> 于 2012/6/9 0:39, Kevin Grittner 写道:
>
>> name | current_setting
>
>> full_page_writes | off
>
> There may be exceptions on some file systems, but generally turning
> this off leaves you vulnerable to possible database corruption if you
> OS or hardware crashes.
Yes, I understand. My situation is, the io utiliztion of my system is
quite high so I turned this off to reduce the io utilization. We have a
replication server to serve as the hot standby if there is any issue on
the primary. So currently I think it's acceptable option to me.
>
>> max_connections | 2500
>
> Yikes! You may want to look in to a connection pooler which can take
> 2500 client connections and funnel them into a much smaller number of
> database connections.
>
> https://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
>> shared_buffers | 60GB
>
> You might want to compare your performance with this setting against
> a smaller setting. Many benchmarks have shown settings about a
> certain point (like 8MB to 12 MB) to be counter-productive, although
> a few have shown increased performance going past that. It really
> seems to depend on your hardware and workload, so you have to test to
> find the "sweet spot" for your environment.
>
>> work_mem | 8MB
>
> With so many connections, I can understand being this low. One of
> the advantages of using connection pooling to funnel your user
> connections into fewer database conncections is that you can boost
> this, which might help considerably with some types of queries.
>
> None of the above, however, really gets to your immediate problem.
> What is most significant about your settings with regard to the
> problem query is what's *not* in that list. You appear to have a
> heavily cached active data set, based on the row counts and timings
> in EXPLAIN ANALYZE output, and you have not adjusted your cost
> factors, which assume less caching.
Thanks for the advices. As of now we don't see overall performance issue
on the db. I will adjust these settings based on your advices if we
begin to see overall performance degrade.
>
> Try setting these on a connection and then running your queries on
> that connection.
>
> set seq_page_cost = 0.1;
> set random_page_cost = 0.1;
> set cpu_tuple_cost = 0.03;
I tried these settings but don't see noticeable improvement. The plan is
not changed.
>
>> Ok, I get out a simple version of the actualy query. Here is the
>> explain anaylze without order-by, which is I wanted:
>> http://explain.depesz.com/s/p1p
>>
>> Another with the order-by which I want to avoid:
>> http://explain.depesz.com/s/ujU
>
> You neglected to mention the LIMIT clause in your earlier
> presentation of the problem. A LIMIT can have a big impact on plan
> choice. Is the LIMIT 10 part of the actual query you want to
> optimize? Either way it would be helpful to see the EXPLAIN ANALYZE
> output for the the query without the LIMIT clause.
Yes, sorry for that. I do need the limit clause in the query to show
only part of the results to the user(common multi-pages view). Without
the limit clause, I got the plan as I wanted:
http://explain.depesz.com/s/Qdu

So looks either I remove the order-by or limit clause, I can get what I
wanted. But I do need the both in the query...

>
> -Kevin
>