[postgis-users] Reworking GIST Indexes

Paul Ramsey pramsey at refractions.net
Wed Mar 12 11:20:11 PST 2003


Are there any implications between index rewrites and the upcoming GEOS 
integration? You mentioned to me that you were interested in taking the 
opportunity of the GEOS integration to revisit some of the basic 
implementation issues around PostGIS, like the internal 
representations.  If you and Mark could be completely in sync about 
what is to be done before patches start coming, that would probably be 
wise. :)

On Wednesday, March 12, 2003, at 11:13 AM, David Blasby wrote:

>>
>>
>> I'm not convinced that users should be expected to always add another
>> clause to a select statement to return a result - the operator should
>> return the correct result according to its definition instead of 
>> having
>> to be filtered again!
>>
> I agree with you 100% - but also remember that the operators (like 
> "&&") work on bounding boxes *not* the actual geometries.  Pretty soon 
> (as soon as GEOS gets integrated), we'll see routines like:
>
> SELECT * FROM data WHERE data.mygeom && <geom> and 
> intersects(data.geom, <geom>)
>
> Meaning, use the index for a parital (BOX) solution, then the actual 
> function to get the correct solution (GEOMETRY).
>
> I've heard that they will be query re-write rules in postgresql 7.4 - 
> have you looked into any of them?  I'm hoping that a user query like:
>
> SELECT * FROM data WHERE intersects(data.geom, <geom>)
> can get auto-magically turned into an indexible statement like:
> SELECT * FROM data WHERE data.mygeom && <geom> and 
> intersects(data.geom, <geom>)
>
>> Yes, I agree with you on this one. I have a feeling that Oleg and
>> Teodor's Rtree implementation is more textbook than practical. Each
>> strategy should call a routine that tests consistency and returns the
>> correct result. I would have no worries about replacing these with
>> routines that we can guarantee will always produce the results we want
>> for a given operator, and I wouldn't be surprised if it didn't work
>> exactly as intended.
>>
> Yes - we should test their implentation and send them feedback. 
> Unfortunately, there isnt
> all that much documentation on the GiST index, and the rtree_gist 
> example is the best...
>
>> 1. Alter the BOX3D type so that it contains a SRID field (and 
>> basically
>> make users have to specify a SRID for the BOX3D when they create it)
>>
>> 2. Alter the operator class so a BOX3D is returned instead of a box 
>> for
>> the GiST entries (I'm guessing the implicit cast in the SQL file is 
>> used
>> to cast geometries to box3ds for queries).
>>
> I dont think the GiST index should be using 3D geometries - thats just 
> adding 50% more information
> to the index that will never be used.
>
>
> How about something like:
> typedef stuct
> {
>    int32         SRID;            // -1 if unknown
>    double      xmin, ymin;
>    double      xmax, ymax;
> } BOX2D;
>
> Its dead easy to create one of these from a GEOMETRY (they have a SRID 
> and bounding volume), and
> you can convert a BOX2D to GEOMETRY by creating one as "BBOXONLYTYPE".
>
>> 3. Keep the same algorithms for the GiST indexes but convert them to
>> BOX3D (this ensures that any changes to the PostgreSQL BOX type won't
>> break things!). Abstract the PostgreSQL interface from the GiST 
>> routines
>> so then the plan is that only the interface routines would need to be
>> rewritten between PostgreSQL revisions (i.e. the algorithms themselves
>> would not have to be altered making them more robust). Perhaps set a
>> compile time choice for either quadratic pick/split or linear 
>> pick/split
>> to show how this adds more flexibility.
>>
> Good idea.
>
>> 4. Rewrite the spatial operators so they use a common routine based on
>> BOX3Ds regardless of whether the operator is indexed or not.
>>
> This is another great idea, but its more subtle that you'd think.  
> When you do something like:
>
> SELECT * FROM table WHERE geo && 'BOX3D(...)';
>
> Whats actually happening is this:
> 1. looks for GEOMETRY && BOX3D  operator (there isnt one)
> 2. converts BOX3D to a GEOEMTRY -"BBOXONLYTYPE"- and finds the 
> GEOMETRY && GEOMETRY operator
> 3. looks to see if GEOMETRY && GEOMETRY is indexable (it is) and 
> determines if it should seqscan or indexscan
>
> So, in the end you are not doing  BOX3D <op> BOX3D, you are actually 
> doing a GEOMETRY <op> GEOMETRY
>
> In the sequence scan case, it calls the GEOMETRY <op> GEOMETRY 
> function directly.
> In the index scan case, it "compresses" the GEOMETRY into a BOX then 
> searches the index using BOX <op> BOX function (or BOX2D <op> BOX2D).
>
> SELECT * FROM table WHERE geo::box3d && 'BOX3D(...)';
>
> This will use the BOX3D <op> BOX3D routines (not indexed).
>
> The postgis-for-7.1 version does ensure that SRIDs are respected (cf. 
> GEOMETRYKEY - note that this is a variable length structure, as you 
> were supposed to use under the old GiST API).
>
> Another idea would be to completely remove the BOX3D type and just 
> have GEOMETRY.  Suck all the functionality of  BOX3D into geometry, 
> and make sure "BBOXONLYTYPE" is a first class citizen of GEOMETRY 
> (currently its not).  You could then put these "BBOXONLYTYPE" 
> geometries into the index (compress would convert a GEOMETRY into a 
> "BBOXONLYTYPE" ) and use the same functions for the index and seq 
> scan.  Unfortunately, this is quite a bit of work and you're storing a 
> bunch of extra junk in the index, but it does make things more 
> consistent.
>
>> I'm willing to put my money where my mouth is and invest some time and
>> effort into this as I think it will make maintaining the codebase much
>> easier, especially in terms of integrating the GEOS library. Would you
>> guys be willing to accept a set of patches from me to implement these
>> changes?
> If you send patches, I'll apply them.
>
> dave
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list