[postgis-devel] Advancing GiST in PgSQL

Andrey Borodin x4mmm at yandex-team.ru
Mon Jul 30 22:06:01 PDT 2018


Hi! Darafei, Paul thank you for your valuable comments!

> 30 июля 2018 г., в 16:10, Darafei Komяpa Praliaskouski <me at komzpa.net> написал(а):
> 
> Hi!
> 
> сб, 28 июл. 2018 г. в 13:44, Andrey Borodin <x4mmm at yandex-team.ru <mailto:x4mmm at yandex-team.ru>>:
> Hi!
> 
> I'm working on GiST advancement and hacked out some patches, now I'm trying to push them into upstream. But it is a long and tedious process, so I've come up with idea to make fork of a GiST as extension (you can check out very crude prototype here [0])
> 
> How do you see a process of PostGIS depending on your gist-as-extension?
I do not expect PostGIS to depend on AGS. Eventually, if AGS will be useful and stable, I'll propose to include it into PostGIS.
It's AGS' job to introduce it's opclasses for PostGIS data types, if PostGIS is installed.
> 
> Is it going to be part of core? 
Hopefully. I will push all advancements to PostgreSQL, but it's not a 20 minute adventure.
> 
> Why will it live long and prosper? :)
Who knows...
> 
>  
> BTW, 2 and 3 are on current commitfest and everyone is very welcome to review them. 2 is very tiny, less than 100loc.
> 
> Can you provide a commitfest link?
https://commitfest.postgresql.org/18/1598 <https://commitfest.postgresql.org/18/1598>
I will send a bottle of Ural beer to whatever country you now reside (and ethanol is legal at your age there), if you will review this patch.
>  
> All these things are already implemented. But I'm looking for more ideas. Darafei Praliaskouski shared with me some ideas on bulk loading, so I'm going to implement them too.
> 
> There is btree_gist contrib. A good solution, IMHO, would be to make it indistinguishable from the "real" btree. 
Intra-page indexing is aimed exactly at this: fast search on page. Smooth build process is also aim of AGS.
>  
> As far as I know PostGIS is the main consumer for Generalized index Search Trees, so, after all, may be we do not need too generalized index.
> If you have some ideas what can be done better in indexing, I'll appreciate if you share them with me. Currently I'm just doing things faster and more efficient.
> 
> A big question is "Problem of Russia" - Russia spans over 180 meridian, and its box covers all of Europe and US. Any indexed query in that area keeps rechecking full contour of Russia, slowing everything down. It would be good if indexing machinery can cope with several bounding boxes living in different parts of index, each coverting much smaller area.
Nice idea. Thanks! I think opclass can provide method to redistribute heap tuple into several index tuples. It seems quite easy to implement.
>  
> And finally, most important question: will it be useful as extension? Does this idea worth efforts? Are there any problems which will make this extension unusable?
> 
> A thing I'm afraid of is maintenance of extension for newer Postgres versions, and dependency management with PostGIS. 
GiST representation of PostGIS datatypes seem to be quite stable along PostGIS versions, isn't it?

> 30 июля 2018 г., в 18:29, Paul Ramsey <pramsey at cleverelephant.ca> написал(а):
> 
> On Sat, Jul 28, 2018 at 3:36 AM, Andrey Borodin <x4mmm at yandex-team.ru <mailto:x4mmm at yandex-team.ru>> wrote:
>> Hi!
>> 
>> I'm working on GiST advancement and hacked out some patches, now I'm trying to push them into upstream. But it is a long and tedious process, so I've come up with idea to make fork of a GiST as extension (you can check out very crude prototype here [0])
> 
> How would an extension deal with gist-as-extension, assuming it also
> wanted to use gist-as-built-in in cases where the extension version
> wasn't around? (Also, some fun version issues there, as presumably a
> goal of gist-as-extension would be provided a faster feature pace than
> stock PgSQL).
> Just register different functions in the opclass, yes?
No, for AGS user have to register AGS's opclasses with usual GiST functions and, probably, some extensible functions.
> 
>> As far as I know PostGIS is the main consumer for Generalized index Search Trees, so, after all, may be we do not need too generalized index.
>> If you have some ideas what can be done better in indexing, I'll appreciate if you share them with me. Currently I'm just doing things faster and more efficient.
>> May be you know some new joins you want, some different searches, some spatio-temoral capabilities of the indexing etc.
> 
> Multi-key support as you noted, can be useful. Indexing any large
> object gets less efficient as the amount of area the keys cover goes
> upwards.
Ok, I'm already thinking about this.
> 
> A join that doesn't have to nested loop might be interesting. Not sure
> the extent to which merging two gist trees can be done (assuming they
> are built using the same opclass maybe?) but anything that speeds up
> joins a little is nice.
I can implement spatial join as in pgSphere. https://github.com/akorotkov/pgsphere/tree/crossmatch_cnode <https://github.com/akorotkov/pgsphere/tree/crossmatch_cnode> But that's harder than multi-keys.
> 
>> Index-as-extension will use generic WAL, so I expect it to be a bit less write-efficient from the start, but algorithmic improvements will cover that.
>> This index will be used with existing functions for opclasses, but require some new functions for advances functions. Currently it works like this:
>> CREATE OPERATOR CLASS ags_cube_ops
>>   DEFAULT FOR TYPE cube USING ags AS -- then all usual functions from cube opcalss
>> 
>> Another open question is how to call it. Currently I use Advanced Generalized Search - AGS as codename, but looking for better names. May be something like Evolution Generalized Search - EGS. Currently, I couldn't come up with fancy name like RUM or GIN :)  I do not think it will be limited just to GiST-like indexes, this is why I removed T (stands for tree). And, if index will be just PostGIS-specialized, G will be removed too. Darafei suggested that I should not use abbreviation, just pick an own name. Also he suggested "ox".
>> 
>> And finally, most important question: will it be useful as extension? Does this idea worth efforts? Are there any problems which will make this extension unusable?
> 
> Hard to say. It's not clear to me that indexes are our biggest pain
> point, they work acceptably well, as is. The exception is probably the
> large/small issues that a multi-key approach could address. For that
> we could also take a run at a GIN implementation, but the extreme
> simplicity of the GIN keys make that a bit tricky.


Thank you!
I'll get back as soon I'll have usable version.

Best regards, Andrey Borodin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20180731/172a7918/attachment-0001.html>


More information about the postgis-devel mailing list