[postgis-users] GIST index speed

Fri Jun 6 12:30:41 PDT 2008

I just want to see if I can duplicate the result, having the original
data makes that a faster process, since I can copy your use case
exactly.

P

On Fri, Jun 6, 2008 at 10:53 AM, Steve Kondik <shade at chemlab.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Paul Ramsey wrote:
>> Steve,
>>
>> Any chance you could share your data with the developers? this is a
>> very strange result.
>>
>> P
>>
>
> Sure, I exported the table with pgsql2shp.
>
> http://n0rp.chemlab.org/geography.tbz2
>
> There's nothing spectacular about the data, it's just
> state/county/zipcode from the US census.
>
>
>> On Fri, Jun 6, 2008 at 9:37 AM, Steve Kondik <shade at chemlab.org> wrote:
>> Steve Kondik wrote:
>>>>> Mark Cave-Ayland wrote:
>>>>>> Steve Kondik wrote:
>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>> Hash: SHA1
>>>>>>>
>>>>>>> I'm seeing this same kind of slowness.  I am in the process of migrating
>>>>>>> an application from Oracle to Postgres.  One query that seems to take
>>>>>>> longer than it should is a very simple point-in-bbox query which takes 3
>>>>>>> seconds.  There are 32000 rows in the table, and the query is as simple
>>>>>>> as it can be:
>>>>>>>
>>>>>>> select count(*) from geography where type='Z' and centroid && (select
>>>>>>> geometry from geography where id=69495);
>>>>>>>
>>>>>>> The second geography is the USA, so this query gets all zipcodes inside
>>>>>>> the bbox, using precalculated centroids.  The query plan looks fine as
>>>>>>> well:
>>>>>>>
>>>>>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>>>>>> time=2407.123..2407.125 rows=1 loops=1)
>>>>>>>    InitPlan
>>>>>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>>>>>> rows=1 width=4528) (actual time=0.027..0.030 rows=1 loops=1)
>>>>>>>            Index Cond: (id = 69495)
>>>>>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>>>>>> (cost=0.00..8.28 rows=1 width=0) (actual time=1.417..2355.049 rows=29687
>>>>>>> loops=1)
>>>>>>>          Index Cond: (centroid && $0)
>>>>>>>          Filter: (centroid && $0)
>>>>>>>  Total runtime: 2407.194 ms
>>>>>>> (8 rows)
>>>>>>>
>>>>>>>
>>>>>>> Any thoughts?  This query takes 200ms on Oracle.  It seems like
>>>>>>> something is wrong for it to take 3 seconds on PostGIS.  It's faster for
>>>>>>> smaller geometries (like a state), but I don't understand why since its
>>>>>>> just point-inside-bbox.
>>>>>> Hi Steve,
>>>>>> Well from the above you can see fairly easily that the time is
>>>>>> disappearing into the second index scan. The above output doesn't look
>>>>>> quite right somehow (it's missing at least one node indicator) - can you
>>>>>> try cutting and pasting the exact output from EXPLAIN into an email
>>>>>> again? Also we'll need the output of "\d geography" from within psql.
>>>>>
>>>>>> ATB,
>>>>>> Mark.
>>>>>
>>>>>
>>>>> Sure thing.  It's taking quite a bit longer now because we're doing some
>>>>> bulk loading.  When nothing else is running, the query takes about 4
>>>>> seconds.
>>>>>
>>>>>
>>>>> health_central_22=> explain analyze select count(*) from geography where
>>>>> type='Z' and centroid && (select geometry from geography where id=69495);
>>>>>
>>>>> QUERY PLAN
>>>>>
>>>>> -
>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>>>> time=16534.739..16534.741 rows=1 loops=1)
>>>>>    InitPlan
>>>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>>>> rows=1 width=4528) (actual time=0.076..0.080 rows=1 loops=1)
>>>>>            Index Cond: (id = 69495)
>>>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>>>> (cost=0.00..8.28 rows=1 width=0) (actual time=25.453..16192.427
>>>>> rows=29687 loops=1)
>>>>>          Index Cond: (centroid && $0)
>>>>>          Filter: (centroid && $0)
>>>>>  Total runtime: 16534.809 ms
>>>>> (8 rows)
>>>>>
>>>>> health_central_22=> \d geography
>>>>>              Table "public.geography"
>>>>>     Column    |          Type          | Modifiers
>>>>> --------------+------------------------+-----------
>>>>>  type         | character(1)           | not null
>>>>>  id           | bigint                 | not null
>>>>>  centroid     | geometry               |
>>>>>  geometry     | geometry               |
>>>>>  name         | character varying(32)  | not null
>>>>>  population   | bigint                 |
>>>>>  abbreviation | character varying(2)   |
>>>>>  po_name      | character varying(100) |
>>>>>  id_geo_state | bigint                 |
>>>>> Indexes:
>>>>>     "geography_pkey" PRIMARY KEY, btree (id)
>>>>>     "idx_geography_2" btree (name)
>>>>>     "idx_geography_3" btree (abbreviation)
>>>>>     "idx_geography_4" btree (id_geo_state)
>>>>>     "idx_geography_centroid" gist (centroid)
>>>>>     "idx_geography_centroid_c" gist (centroid) WHERE type = 'C'::bpchar
>>>>>     "idx_geography_centroid_s" gist (centroid) WHERE type = 'S'::bpchar
>>>>>     "idx_geography_centroid_z" gist (centroid) WHERE type = 'Z'::bpchar
>>>>>     "idx_geography_geometry" gist (geometry) CLUSTER
>>>>>     "idx_geography_geometry_c" gist (geometry) WHERE type = 'C'::bpchar
>>>>>     "idx_geography_geometry_r" gist (geometry) WHERE type = 'R'::bpchar
>>>>>     "idx_geography_geometry_s" gist (geometry) WHERE type = 'S'::bpchar
>>>>>     "idx_geography_geometry_z" gist (geometry) WHERE type = 'Z'::bpchar
>>>>> Foreign-key constraints:
>>>>>     "fk_geography_1" FOREIGN KEY (id_geo_state) REFERENCES geography(id)
>>>>>
>> Adding force_2d speeds this up immensely.  Not sure I understand why
>> this is necessary, but I'm still getting used to PostGIS.
>>
>>
>> health_central_22=> explain analyze select id,name from geography where
>> type='Z' and centroid && (select force_2d(geometry) from geography where
>> id=69495);
>>                                                                QUERY
>> PLAN
>> -
>> ------------------------------------------------------------------------------------------------------------------------------------------
>>  Index Scan using idx_geography_centroid on geography  (cost=8.29..16.57
>> rows=1 width=14) (actual time=0.263..125.338 rows=29687 loops=1)
>>   Index Cond: (centroid && $0)
>>   Filter: ((centroid && $0) AND (type = 'Z'::bpchar))
>>   InitPlan
>>     ->  Index Scan using geography_pkey on geography  (cost=0.00..8.29
>> rows=1 width=4441) (actual time=0.158..0.161 rows=1 loops=1)
>>           Index Cond: (id = 69495)
>>  Total runtime: 171.229 ms
>> (7 rows)
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFISXmUMrBfzfMVwMcRAk0/AJsHZwP3p8AUVgSvWpPHFoGW/KCnTQCfY1QN
> ZpmBnxOzQ3KQX2TG+03yVxM=
> =z/t9
> -----END PGP SIGNATURE-----
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>