Re: cube operations slower than geo_distance() on production server - Mailing list pgsql-performance
From | Merlin Moncure |
---|---|
Subject | Re: cube operations slower than geo_distance() on production server |
Date | |
Msg-id | b42b73150702121103l297b690w84dde8ef9ef8d2f1@mail.gmail.com Whole thread Raw |
In response to | Re: cube operations slower than geo_distance() on production server (Mark Stosberg <mark@summersault.com>) |
Responses |
Re: cube operations slower than geo_distance() on production server
Re: cube operations slower than geo_distance() on production server |
List | pgsql-performance |
On 2/12/07, Mark Stosberg <mark@summersault.com> wrote: > Merlin Moncure wrote: > > On 2/10/07, Mark Stosberg <mark@summersault.com> wrote: > >> > >> With the help of some of this list, I was able to successfully set up > >> and benchmark a cube-based replacement for geo_distance() calculations. > >> > >> On a development box, the cube-based variations benchmarked consistently > >> running in about 1/3 of the time of the gel_distance() equivalents. > >> > >> After setting up the same columns and indexes on a production > >> database, it's a different story. All the cube operations show > >> themselves to be about the same as, or noticeably slower than, the same > >> operations done with geo_distance(). > >> > >> I've stared at the EXPLAIN ANALYZE output as much I can to figure what's > >> gone. Could you help? > >> > >> Here's the plan on the production server, which seems too slow. Below > >> is the plan I get in > >> on the development server, which is much faster. > >> > >> I tried "set enable_nestloop = off", which did change the plan, but > >> the performance. > >> > >> The production DB has much more data in it, but I still expected > >> comparable results relative > >> to using geo_distance() calculations. > > > > any objection to posting the query (any maybe tables, keys, indexes, etc)? > > Here the basic query I'm using: > SELECT > -- 1609.344 is a constant for "meters per mile" > cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode = > '90210') , earth_coords)/1609.344 > AS RADIUS > FROM pets > -- "shelters_active" is a view where "shelter_state = 'active'" > JOIN shelters_active as shelters USING (shelter_id) > -- The zipcode fields here are varchars > JOIN zipcodes ON ( > shelters.postal_code_for_joining = zipcodes.zipcode ) > -- search for just 'dogs' > WHERE species_id = 1 > AND pet_state='available' > AND earth_box( > (SELECT earth_coords from zipcodes WHERE zipcode = '90210') , > 10*1609.344 > ) @ earth_coords > ORDER BY RADIUS; your query looks a bit funky. here are the problems I see. * in your field list, you don't need to re-query the zipcode table. > cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode = > '90210') , earth_coords)/1609.344 AS RADIUS becomes cube_distance(pets.earth_coords, earth_coords ) / 1609.344 AS RADIUS also, dont. re-refer to the zipcodes table in the join clause. you are already joining to it: > AND earth_box( > (SELECT earth_coords from zipcodes WHERE zipcode = '90210') , > 10*1609.344) @ earth_coords becomes AND earth_box(zipcodes.earth_coords, 10*1609.344) ) @ pets.earth_coords * also, does pet_state have any other states than 'available' and ' not available'? if not, you should be using a boolean. if so, you can consider a functional index to convert it to a booelan. * if you always look up pets by species, we can explore composite index columns on species, available (especially using the above functional suggestion), etc. composite > partial (imo) thats just to start. play with it and see what comes up. merlin
pgsql-performance by date: