[postgis-devel] GiST Sorting

Marco Boeringa marco at boeringa.demon.nl
Mon Nov 29 23:03:30 PST 2021


Hi Paul,

Thanks for these insights. In fact, seeing some of the similar 
discussions in the past about build time versus query performance when 
Han was still working on this project, was one of the reasons I 
initially asked about this here on the list, and asked about real world 
experiences with the new versions of PostgreSQL and PostGIS.

However, your remark about "flat trees" raises another question for me:

- Is the issue you describe data size depended or not? E.g. does the 50% 
penalty on query performance possibly only occur on small datasets, but 
not - or much less so - on something like a Planet size extract of 
OpenStreetMap? Or is this a universal issue?

Having a 50% decrease in query performance of primarily small datasets 
might be acceptable, if it means much larger datasets can be indexed 
much faster and have minimal impact of the same issue. That said, I 
understand your caution if the 50% decrease in query performance is a 
universal issue with the new implementation.

Marco

Op 30-11-2021 om 00:08 schreef Paul Ramsey:
> Sorry to be the rain on the parade, but I think that the results of testing on performance for the new index sorting mean we should NOT be enabling it by default for this release.
>
> Basically we have multiple tests now that show that the index is slower for query performance, and we even have a convincing working theory for why (the improved packing into pages/nodes means that we have very flat trees which are expensive to query).
>
> I think releasing with a "new improved" version that degrades most people's systems by 50% is not a good look. We can leave all the code in there and just take the sorting function out of the opclass. That way people with "build an index and use it once" use cases can still turn the feature on, but people with normal use patterns aren't going to silently get a kick in the teeth when they upgrade to 3.2.
>
> If you would like to replicate my results, I provide them below.
>
> P.
>
>
> Data: https://www.dropbox.com/s/e75g1y9ua1da6qu/roads_rdr.sql.gz?dl=0
>
> create index roads_rdr_idx on roads_rdr using gist (geom);
>
> 13/3.2/CREATE 1546
> 13/3.1/CREATE 1683
> 14/3.2/CREATE 200
> 14/3.1/CREATE 1705
>
> select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
>
> 13/3.2/QUERY  4600
> 13/3.1/QUERY 4540
> 14/3.2/QUERY 11200
> 14/3.1/QUERY 4700
>
> select pg_relation_size('roads_rdr_idx');
>
> 13/3.2/IDXSIZE  5414912
> 13/3.1/IDXSIZE 5496832
> 14/3.2/IDXSIZE 2940928
> 14/3.1/IDXSIZE 5439488
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel


More information about the postgis-devel mailing list