[postgis-users] ST_Contains() performance problem

Stephan Grüter grueter at wuestundpartner.com
Sat Apr 26 10:09:20 PDT 2008


Hi Mark, Regina and Kevin

I think, I found the source of the problem.

It seems, that the planner of PG 8.2 always used internal functions
like "=" before external functions whereas the planner of PG 8.3
decides according to the estimated cost value of the external functions
and most of the postgis functions have a cost value of 1.

After I changed the estimated cost of st_geometry_overlap() (used by  
&& operator)
and contains() from 1 to 2, the planner behaved exactly like the  
former version and
the performance of the query was as good as before.

So I have a workaround for this particular case, but the question
remains, how to deal with this situation generally:
- Is it possible to disable the evaluation of function costs in
   PG 8.3 with a configuration parameter (I didn't found one yet)?
- Are the default cost values of the postgis functions reasonable
   for all or most situations or do we have to fine tune the values?

Regards,

Stephan

Am 25.04.2008 um 16:33 schrieb Mark Cave-Ayland:

> Stephan Grüter wrote:
>> Hi Kevin,
>> Am 25.04.2008 um 03:59 schrieb Kevin Neufeld:
>>> I find it curious that your old system has
>>> "retyp=8 AND (geom && ...) AND contains(...)"
>>>
>>> but you new system has
>>> "(geom && ...) AND contains(...) AND regtyp=8"
>>> I think PostgreSQL does short-circuit evaluation.  Testing for  
>>> integer equality and dropping out of a filter clause is  
>>> significantly faster than testing a bounding box intersection, a  
>>> contains operation, and then finishing with a test for integer  
>>> equality.
>> Yes, I think the execution planner of PG 8.3 makes a very bad  
>> decision in this case. But why? The query, the data,
>> the indexes and the clustering are identical on both systems.
>>>  Are you sure you use the same query on both systems?
>> Yes, I'm sure. I noticed a performance degradation of an existing
>> web application and after some profiling I found the this query
>> as the source of the problem.
>> Regards,
>> Stephan
>
>
> Hmmm. It does seem strange that the evaluation order is different,  
> however planner costs are influenced by the parameters in  
> postgresql.conf, and to a less extent, the statistics sampling  
> routine.
>
> So a good thing to do would be to compare the settings in  
> postgresql.conf between both systems, and also check the statistics  
> target for each of the columns within the join before going any  
> further. Don't forget to restart PostgreSQL if you make any changes  
> to the memory /cost settings.
>
>
> HTH,
>
> Mark.
>
> -- 
> Mark Cave-Ayland
> Sirius Corporation - The Open Source Experts
> http://www.siriusit.co.uk
> T: +44 870 608 0063
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>




More information about the postgis-users mailing list