[postgis-devel] GiST Sorting

Giuseppe Broccolo g.broccolo.7 at gmail.com
Sun Jan 16 17:19:24 PST 2022


Hi all,

I have done a simple test during the last three days about the inclusion of
the pre-sorting support for GiST index, considering the following 4
scenarios:

   - PostgreSQL tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST not presorted
   - PostgreSQL tag REL_14_1  ÷ POSTGIS tag 3.2.0: GiST presorted
   - PostgreSQL master head (commit 269b53) ÷ POSTGIS tag 3.2.0: GiST not
   presorted
   - PostgreSQL master head (commit 269b53) ÷ POSTGIS tag 3.2.0: GiST
   presorted

in order to compare the performance before and after the patch done for the
presorting support in PG15.

As a testbed, I considered querying *all the boroughs' borders in the UK
included in a specific 1km square (SRID 27700).* I considered as a sample
the one provided by the Ordnance Survey
<https://osdatahub.os.uk/downloads/open/BoundaryLine> (a single shape file
of 6967 multipolygons).

For each scenario I tried to obtain:

   - time need for GiST index build
   - details about the structure of the GiST index (used the useful
   functions provided by the gevel
   <http://www.sai.msu.su/~megera/wiki/Gevel> extension, in particular I
   extended Paul's fork <https://github.com/pramsey/gevel> to make it work
   for PG15)
   - details about the execution of the aforementioned query, considering
   10 executions to take into account fluctuations during the work of the DB
   engine


*Below the details of the used query and results:*

#####################################################################
#       PG tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST no presorted      #
#####################################################################

test=# select version();
                                               version

-----------------------------------------------------------------------------------------------------
 PostgreSQL 14.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)


test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 86.342 ms


test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
                gist_stat
-----------------------------------------
 Number of levels:          2           +
 Number of pages:           48          +
 Number of leaf pages:      47          +
 Number of tuples:          7014        +
 Number of invalid tuples:  0           +
 Number of leaf tuples:     6967        +
 Total size of tuples:      196968 bytes+
 Total size of leaf tuples: 195640 bytes+
 Total size of index:       393216 bytes+

(1 row)


test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));

                                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.229..0.307
rows=1 loops=1)
   Buffers: shared hit=6
   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0)
(actual time=0.083..0.160 rows=4 loops=1)
         Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
         Buffers: shared hit=6
 Planning Time: 0.270 ms
 Execution Time: 0.449 ms
(7 rows)

**** AVG. OF 10: 0.501 ms ****


#####################################################################
#        PG tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST presorted        #
#####################################################################

test=# select version();
                                               version

-----------------------------------------------------------------------------------------------------
 PostgreSQL 14.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)


test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 33.263 ms

test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
                gist_stat
-----------------------------------------
 Number of levels:          2           +
 Number of pages:           28          +
 Number of leaf pages:      27          +
 Number of tuples:          6994        +
 Number of invalid tuples:  0           +
 Number of leaf tuples:     6967        +
 Total size of tuples:      196168 bytes+
 Total size of leaf tuples: 195400 bytes+
 Total size of index:       229376 bytes+

(1 row)


test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));

                                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.296..0.345
rows=1 loops=1)
   Buffers: shared hit=6
   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0)
(actual time=0.129..0.228 rows=4 loops=1)
         Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
         Buffers: shared hit=6
 Planning Time: 0.233 ms
 Execution Time: 0.612 ms
(7 rows)

**** AVG. OF 10: 0.607 ms ****


#####################################################################
#      PG commit 269b53 ÷ POSTGIS tag 3.2.0: GiST no presorted      #
#####################################################################

test=# select version();
                                                version

--------------------------------------------------------------------------------------------------------
 PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)

test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 87.987 ms

test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
                gist_stat
-----------------------------------------
 Number of levels:          2           +
 Number of pages:           48          +
 Number of leaf pages:      47          +
 Number of tuples:          7014        +
 Number of invalid tuples:  0           +
 Number of leaf tuples:     6967        +
 Total size of tuples:      196968 bytes+
 Total size of leaf tuples: 195640 bytes+
 Total size of index:       393216 bytes+

(1 row)


test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));

                                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.284..0.326
rows=1 loops=1)
   Buffers: shared hit=5
   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0)
(actual time=0.046..0.228 rows=4 loops=1)
         Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
         Buffers: shared hit=5
 Planning Time: 0.129 ms
 Execution Time: 0.405 ms
(7 rows)

**** AVG. OF 10: 0.507 ms ****


#####################################################################
#         PG commit 269b53 ÷ POSTGIS tag 3.2.0: GiST presorted      #
#####################################################################

test=# alter operator family gist_geometry_ops_2d using gist add function
11 (geometry) geometry_gist_sortsupport_2d (internal);
ALTER OPERATOR FAMILY


test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 52.949 ms


test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
                gist_stat
-----------------------------------------
 Number of levels:          2           +
 Number of pages:           49          +
 Number of leaf pages:      48          +
 Number of tuples:          7015        +
 Number of invalid tuples:  0           +
 Number of leaf tuples:     6967        +
 Total size of tuples:      197008 bytes+
 Total size of leaf tuples: 195652 bytes+
 Total size of index:       401408 bytes+

(1 row)


test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));

                                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.275..0.321
rows=1 loops=1)
   Buffers: shared hit=5
   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0)
(actual time=0.078..0.174 rows=4 loops=1)
         Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
         Buffers: shared hit=5
 Planning Time: 0.203 ms
 Execution Time: 0.481 ms
(7 rows)

**** AVG. OF 10: 0.519ms ****

*Conclusions*

   - for PG14, the presorting makes the resulting index with a lower number
   of leaf pages with an overlapping number of entries between pages
   - index built with presorting is ~30% the time of the built without this
   support, but query execution is ~20% slower (compatible to what Han
   observed during his work)
   - considering PG15dev, performances of the GiST index are similar to
   PG14 excluding presorting (identical structure of the index)
   - presorting in PG15dev makes the GiST index much similar in structure
   to the case the presorting is avoided, but the time needed for the build is
   almost ~50% lower
   - given the more similar structure of the index, query performance is
   almost the same (the difference is anyway lower than the fluctuations
   during the executions)

I am not including for a matter of time a similar study I did with self
join queries (SELECT count(*) FROM district_borough_unitary_ward_region_gb
a, district_borough_unitary_ward_region_gb b WHERE a.geom && b.geom), but
the conclusions are basically the same.

Let me know your opinion,
Giuseppe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20220117/1e3e6009/attachment-0001.html>


More information about the postgis-devel mailing list