[postgis-devel] RFC: Remove mmin and mmax from GBOX

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Mon Jul 11 08:00:51 PDT 2011


On 11/07/11 15:05, chodgson at refractions.net wrote:

>> If you are starting to ask those type of queries, then M is not an
>> appropriate place for this data - you would spin it out into a separate
>> column within your table and index it appropriately there. Then the
>> PostgreSQL planner has a lot more information to be able to choose a
>> better query plan.
>
> If your query is - In this set of traces, select all the traces within a
> specified bounding box that also pass through regions where the mosquito count
> is within a given range, then I think it is appropriate to put your data in the
> M column, and to index it as 4-D. So I think the semantics of what is a
> "dimension" and what is not are dependent on the problem space more that on the
> units of the M value.

Really? In my experience, stuffing everything into a single 
multi-dimensional index is the *worst* way to handle this type of query. 
Think of how much larger the multi-dimensional index will become, and 
also if you're using mosquito bites/temperature then the fact that these 
readings like within a very narrow range will skew the index and make it 
much less efficient.

Effectively what you're stating here is that instead of having a table 
with columns X, Y, Z and M, a table with a single column of ARRAY[X, Y, 
Z, M] would be more efficient which is very unlikely to be the case. For 
example, having a separate M column with a B-Tree index means that range 
searches on M can eliminate huge chunks of I/O if the lower selectivity 
means the comparatively larger M-D index is ignored.

(As a separate off-topic thought, I've never understood why the OGC 
model includes M as part of the main object instead of placing the value 
into a separate column. It comes across to me that while they understood 
its purpose within shapefiles, they didn't quite understand its 
equivalence within the relational model)

> However I do agree that there are many situations where it doesn't make sense to
> index the M value, and it would be nice if there was a way to disable it. And it
> also makes sense that these same situations will affect the treatment of the
> data by other functions. I'm not certain that the situation of "4-d index makes
> sense/is useful" is necessarily the same situation as "4th coordinate is a
> 'true' dimension". So perhaps there are two separate bits of metadata to capture
> there.

I agree you bring up a valid point here; how do we control how many 
dimensions should be indexed? It seems to me that we both see that there 
is something lacking in the current setup, but have different ideas on 
how to go about solving this.


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs



More information about the postgis-devel mailing list