Lists: | pgsql-performance |
---|
From: | Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | query becomes fas on 'SET enable_hashjoin TO off;' |
Date: | 2009-02-10 10:31:21 |
Message-ID: | a97c77030902100231y28e328d2kc4575f25e2a5727c@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Hi ,
I have a query in which two huge tables (A,B) are joined using an indexed
column and a search is made on tsvector on some column on B. Very limited
rows of B are expected to match the query on tsvector column.
With default planner settings the query takes too long ( > 100 secs) , but
with hashjoin off it returns almost immediately. The question is , is
it is advisable to
tweak planner settings for specific queries in application ?
The plans are as follows.
1. With default settings
explain select lead_id from general.trade_leads join
general.profile_master as pm using(profile_id) where status ='m' and
co_name_vec @@ to_tsquery('plastic&tubes') limit 20;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=4109.11..11127.78 rows=20 width=4)
-> Hash Join (cost=4109.11..90789.72 rows=247 width=4)
Hash Cond: (trade_leads.profile_id = pm.profile_id)
-> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8)
Filter: ((status)::text = 'm'::text)
-> Hash (cost=4095.68..4095.68 rows=1074 width=4)
-> Bitmap Heap Scan on profile_master pm
(cost=40.89..4095.68 rows=1074 width=4)
Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
-> Bitmap Index Scan on
profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0)
Index Cond: (co_name_vec @@ '''plastic'' &
''tube'''::tsquery)
(10 rows)
2. with SET enable_hashjoin TO off;
explain analyze select lead_id from general.trade_leads join
general.profile_master as pm using(profile_id) where status ='m' and
co_name_vec @@ to_tsquery('plastic&tubes') limit 20;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.42..13080.44 rows=20 width=4) (actual
time=1530.039..1530.039 rows=0 loops=1)
-> Nested Loop (cost=3.42..161504.56 rows=247 width=4) (actual
time=1530.037..1530.037 rows=0 loops=1)
-> Index Scan using profile_master_co_name_vec on
profile_master pm (cost=0.00..4335.36 rows=1074 width=4) (actual
time=220.821..1014.501 rows=7 loops=1)
Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
-> Bitmap Heap Scan on trade_leads (cost=3.42..145.75
rows=47 width=8) (actual time=73.640..73.640 rows=0 loops=7)
Recheck Cond: (trade_leads.profile_id = pm.profile_id)
Filter: ((status)::text = 'm'::text)
-> Bitmap Index Scan on trade_leads_profile_id
(cost=0.00..3.41 rows=47 width=0) (actual time=73.579..73.579 rows=0
loops=7)
Index Cond: (trade_leads.profile_id = pm.profile_id)
Total runtime: 1530.137 ms
regds
mallah.
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: query becomes fas on 'SET enable_hashjoin TO off;' |
Date: | 2009-02-10 13:06:30 |
Message-ID: | 603c8f070902100506w7074ee35qb22f701cf751c7f0@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On Tue, Feb 10, 2009 at 5:31 AM, Rajesh Kumar Mallah
<mallah(dot)rajesh(at)gmail(dot)com> wrote:
> I have a query in which two huge tables (A,B) are joined using an indexed
> column and a search is made on tsvector on some column on B. Very limited
> rows of B are expected to match the query on tsvector column.
>
> With default planner settings the query takes too long ( > 100 secs) , but
> with hashjoin off it returns almost immediately. The question is , is
> it is advisable to
> tweak planner settings for specific queries in application ?
The ones that start with "enable_" usually shouldn't be changed.
They're mostly for debugging and finding problems.
> The plans are as follows.
It's a little hard to figure out what's gone wrong here because you've
only included EXPLAIN ANALYZE output for one of the plans - the other
is just regular EXPLAIN. Can you send that, along with the output of
the following query:
SELECT SUM(1) FROM trade_leads WHERE status = 'm'
I'm guessing that the problem is that the selectivity estimate for
co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm
not real familiar with full text search, so I'm not sure whether
there's anything sensible you can do about it.
...Robert
From: | Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: query becomes fas on 'SET enable_hashjoin TO off;' |
Date: | 2009-02-10 14:29:42 |
Message-ID: | a97c77030902100629y14bc353cq531ac8ac397dfece@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Dear Robert,
thanks for ur interest. Our server was too loaded
what i posted my last observation, now the
other explain analyze can also be run and i am posting
both the result , as you can see latter is 55ms versus
3000 ms .
explain analyze select lead_id from general.trade_leads join
general.profile_master as pm using(profile_id) where status ='m' and
co_name_vec @@ to_tsquery('plastic&tubes') limit 20;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4109.11..11127.78 rows=20 width=4) (actual
time=3076.059..3076.059 rows=0 loops=1)
-> Hash Join (cost=4109.11..90789.72 rows=247 width=4) (actual
time=3076.057..3076.057 rows=0 loops=1)
Hash Cond: (trade_leads.profile_id = pm.profile_id)
-> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832
width=8) (actual time=0.020..2972.446 rows=127371 loops=1)
Filter: ((status)::text = 'm'::text)
-> Hash (cost=4095.68..4095.68 rows=1074 width=4) (actual
time=42.368..42.368 rows=7 loops=1)
-> Bitmap Heap Scan on profile_master pm
(cost=40.89..4095.68 rows=1074 width=4) (actual time=42.287..42.360
rows=7 loops=1)
Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
-> Bitmap Index Scan on
profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0)
(actual time=42.252..42.252 rows=7 loops=1)
Index Cond: (co_name_vec @@ '''plastic'' &
''tube'''::tsquery)
Total runtime: 3076.121 ms
(11 rows)
tradein_clients=> SET enable_hashjoin TO off;
SET
tradein_clients=> explain analyze select lead_id from
general.trade_leads join general.profile_master as pm
using(profile_id) where status ='m' and co_name_vec @@
to_tsquery('plastic&tubes') limit 20;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.42..13080.44 rows=20 width=4) (actual
time=55.233..55.233 rows=0 loops=1)
-> Nested Loop (cost=3.42..161504.56 rows=247 width=4) (actual
time=55.232..55.232 rows=0 loops=1)
-> Index Scan using profile_master_co_name_vec on
profile_master pm (cost=0.00..4335.36 rows=1074 width=4) (actual
time=16.578..46.175 rows=7 loops=1)
Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
-> Bitmap Heap Scan on trade_leads (cost=3.42..145.75
rows=47 width=8) (actual time=1.287..1.287 rows=0 loops=7)
Recheck Cond: (trade_leads.profile_id = pm.profile_id)
Filter: ((status)::text = 'm'::text)
-> Bitmap Index Scan on trade_leads_profile_id
(cost=0.00..3.41 rows=47 width=0) (actual time=1.285..1.285 rows=0
loops=7)
Index Cond: (trade_leads.profile_id = pm.profile_id)
Total runtime: 55.333 ms
(11 rows)
SELECT SUM(1) FROM general.trade_leads WHERE status = 'm';
sum
--------
127371
this constitutes 90% of the total rows.
regds
mallah.
On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Feb 10, 2009 at 5:31 AM, Rajesh Kumar Mallah
> <mallah(dot)rajesh(at)gmail(dot)com> wrote:
>> I have a query in which two huge tables (A,B) are joined using an indexed
>> column and a search is made on tsvector on some column on B. Very limited
>> rows of B are expected to match the query on tsvector column.
>>
>> With default planner settings the query takes too long ( > 100 secs) , but
>> with hashjoin off it returns almost immediately. The question is , is
>> it is advisable to
>> tweak planner settings for specific queries in application ?
>
> The ones that start with "enable_" usually shouldn't be changed.
> They're mostly for debugging and finding problems.
>
>> The plans are as follows.
>
> It's a little hard to figure out what's gone wrong here because you've
> only included EXPLAIN ANALYZE output for one of the plans - the other
> is just regular EXPLAIN. Can you send that, along with the output of
> the following query:
>
> SELECT SUM(1) FROM trade_leads WHERE status = 'm'
>
> I'm guessing that the problem is that the selectivity estimate for
> co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm
> not real familiar with full text search, so I'm not sure whether
> there's anything sensible you can do about it.
>
> ...Robert
>
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: query becomes fas on 'SET enable_hashjoin TO off;' |
Date: | 2009-02-10 15:39:18 |
Message-ID: | 6927.1234280358@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> writes:
> On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I'm guessing that the problem is that the selectivity estimate for
>> co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm
>> not real familiar with full text search, so I'm not sure whether
>> there's anything sensible you can do about it.
Yeah, the bad selectivity estimate seems to be the entire problem ---
if that were even slightly closer to reality the planner would've
preferred the nestloop.
I don't think there's a good solution to this in 8.3, because its
estimator for @@ is just a stub. There will be a non-toy estimator
in 8.4, fwiw.
A possibility that seems a bit less crude than turning off hashjoins
is to reduce random_page_cost, so as to bias things toward nestloop
indexscans in general.
regards, tom lane
From: | Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: query becomes fas on 'SET enable_hashjoin TO off;' |
Date: | 2009-02-10 15:52:39 |
Message-ID: | a97c77030902100752u47d16e8bx1a57f36af6abcc7d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On Tue, Feb 10, 2009 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com> writes:
>> On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I'm guessing that the problem is that the selectivity estimate for
>>> co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm
>>> not real familiar with full text search, so I'm not sure whether
>>> there's anything sensible you can do about it.
>
> Yeah, the bad selectivity estimate seems to be the entire problem ---
> if that were even slightly closer to reality the planner would've
> preferred the nestloop.
>
> I don't think there's a good solution to this in 8.3,
this is 8.2 server at the moment.
>because its
> estimator for @@ is just a stub. There will be a non-toy estimator
> in 8.4, fwiw.
>
> A possibility that seems a bit less crude than turning off hashjoins
> is to reduce random_page_cost, so as to bias things toward nestloop
> indexscans in general.
reducing random_page_cost from 4 (default) to 3 does switch the plan
in favour of nested loop thanks for the suggestion.
SET random_page_cost TO 4;
SET
tradein_clients=> explain select lead_id from general.trade_leads
join general.profile_master as pm using(profile_id) where status ='m'
and co_name_vec @@ to_tsquery('plastic&tubes') limit 20;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=4109.11..11127.78 rows=20 width=4)
-> Hash Join (cost=4109.11..90789.72 rows=247 width=4)
Hash Cond: (trade_leads.profile_id = pm.profile_id)
-> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8)
Filter: ((status)::text = 'm'::text)
-> Hash (cost=4095.68..4095.68 rows=1074 width=4)
-> Bitmap Heap Scan on profile_master pm
(cost=40.89..4095.68 rows=1074 width=4)
Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
-> Bitmap Index Scan on
profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0)
Index Cond: (co_name_vec @@ '''plastic'' &
''tube'''::tsquery)
(10 rows)
tradein_clients=> SET random_page_cost TO 3;
SET
tradein_clients=> explain select lead_id from general.trade_leads
join general.profile_master as pm using(profile_id) where status ='m'
and co_name_vec @@ to_tsquery('plastic&tubes') limit 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..9944.78 rows=20 width=4)
-> Nested Loop (cost=0.00..122818.07 rows=247 width=4)
-> Index Scan using profile_master_co_name_vec on
profile_master pm (cost=0.00..3256.28 rows=1074 width=4)
Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery)
-> Index Scan using trade_leads_profile_id on trade_leads
(cost=0.00..110.76 rows=45 width=8)
Index Cond: (trade_leads.profile_id = pm.profile_id)
Filter: ((status)::text = 'm'::text)
(8 rows)
>
> regards, tom lane
>