[postgis-devel] Re: [postgis-users] GIST index speed
Mark Cave-Ayland
mark.cave-ayland at siriusit.co.uk
Wed Jun 11 08:26:29 PDT 2008
Paul Ramsey wrote:
> Mark,
>
> Is it possible something terrible is happening in LWGEOM_gist_consistent?
>
> The place the slow differs from the fast is that the slow spends a
> great deal of time in _memcpy, which is driven by toast fetching,
> which is driven by LWGEOM_gist_consistent.
>
> There's lots of activity in the operating system, as well, in
> vm_fault, which again reflects the idea that we are doing a lot of
> memory allocation.
>
> P.
Righto. After a reasonably small thread on -hackers, we've got as far as
determining what the problem is. The issue comes with the query plan
looking like this:
postgis=# explain analyze select count(*) from geography where centroid
&& (select the_geom from geography where id=69495);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual
time=2691.783..2691.784 rows=1 loops=1)
InitPlan
-> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559)
(actual time=60.987..61.373 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using geography_geom_centroid_idx on geography
(cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722
rows=32880 loops=1)
Index Cond: (centroid && $0)
Filter: (centroid && $0)
Total runtime: 2692.288 ms
(8 rows)
Since the geometry which is returned as the result of a sub-select is so
large, it has been stored on disk in TOAST (compressed) format, and so
it must be decompressed before its content can be accessed. The problem
is that every time we read each one of the 32880 results from the
geography_geom_centroid_idx index scan, the large geometry is being
de-TOASTed, compared against the index, and then thrown away. Hence a
large proportion of the query is being wasted constantly de-TOASTing the
geometry from the sub-select.
If you look at the wrapper function I posted earlier in thread, it
should be reasonably apparent why everything suddenly becomes much quicker:
Datum LWGEOM_mcatest(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(LWGEOM_mcatest);
Datum LWGEOM_mcatest(PG_FUNCTION_ARGS)
{
PG_LWGEOM *pgl = (PG_LWGEOM *)
PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
void *mem;
/* Copy somewhere else */
mem = palloc(VARSIZE(pgl));
memcpy(mem, pgl, VARSIZE(pgl));
PG_RETURN_POINTER(mem);
}
CREATE OR REPLACE FUNCTION mcatest(geometry)
RETURNS geometry
AS '$libdir/lwpostgis','LWGEOM_mcatest'
LANGUAGE 'C';
Here we are de-TOASTING the incoming geometry once and creating a new
geometry, which will not be compressed since TOAST is only invoked
during storage. So the reason the query below is so quick is because the
mcatest() wrapper function is effectively acting as a TOAST cache.
postgis=# explain analyze select count(*) from geography where centroid
&& (select mcatest(the_geom) from geography where id=69495);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual
time=283.126..283.127 rows=1 loops=1)
InitPlan
-> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559)
(actual time=48.712..49.040 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using geography_geom_centroid_idx on geography
(cost=0.00..8.28 rows=1 width=0) (actual time=49.321..215.524 rows=32880
loops=1)
Index Cond: (centroid && $0)
Filter: (centroid && $0)
Total runtime: 283.221 ms
(8 rows)
The question is: what can we do about this, especially as this is a very
common case within PostGIS? According to the thread on -hackers, Tom
suggested it may be possible to get PostgreSQL to automatically de-TOAST
index scan keys. Unfortunately while this would solve the problem, it
requires some non-trivial development work (i.e. it isn't exactly clear
how to do it), and wouldn't help existing users.
Other than that, I think our options are limited :( I've tried changing
all the spatial operators to work on BOX2D objects instead (hoping that
the implicit cast from geometry to box2d would kick in), but with this
in place it is then impossible to create an index since the opclass type
of box2d no longer matches the geometry type of the column.
So I'm not really sure what we can do about this... it's so frustrating
as I know people will be seeing bad performance from PostGIS in a large
number of real cases because of this, yet there doesn't seem to be a
viable solution that I can think of at the moment. Does anyone else have
any bright ideas?
ATB,
Mark.
--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
More information about the postgis-devel
mailing list