[postgis-devel] Issues about the GSoC project

Tue Jul 27 14:35:33 PDT 2021

Hi Han,

Il giorno lun 26 lug 2021 alle ore 14:21 Han Wang <hanwgeek at gmail.com> ha
scritto:

> Hi Giuseppe and Hi Regina,
>
> After checking the paper of GiST and implementation in Postgres. I think
> the query performance should be considered besides the building process. In
> the larger data test scenario, the building time of different indexes are
> similar because Postgres just hashes the tuples and sorts them and packs
> them into pages, building the tree index from bottom to up. With a bad hash
> order definition, the building process cannot detect the poor index query
> performance. So it is necessary to test the index query performance. I have
> tested the query performance with the `EXPLAIN` operator, using the sql
> scripts like other indexes in the `/regress`. But I am not familiar with
> PL/pgSQL, so I handle the log with some python scripts.
>
> In this test, I focus on the buffer hits and execution time of different
> tasks of different index types including `No Index`, `Simple GiST index`,
> `X hash function`, `morton hash function` and `hilbert hash function`.
> And there are some results:
> Shared buffer hits:
>  IndexCreate Time(ms)<<&<&&&>>>~=~@&<|<<||>>|&>
> 0 No Index 0 18 18 18 18 18 18 18 18 18 18 18 18
> 1 GiST Index 25.249 40237 46085 3009 40297 46025 3009 3009 3009 41994
> 45295 41934 45355
> 2 X PreSort Index 8.829 443620 441235 3009 444501 440354 3009 3009 3009
> 441568 442503 440687 443384
> 3 Morton PreSort Index 16.885 447779 447446 4079 448669 446556 4079 4079
> 4079 445362 449428 444472 450318
> 4 Hilbert PreSort Index 16.824 446714 444058 3558 447600 443172 3558 3558
> 3558 446072 445394 445186 446280
> Execution time:
>
> IndexCreate Time(ms)<<&<&&&>>>~=~@&<|<<||>>|&>
> 0 No Index 0 567.251 565.720 481.618 561.877 563.452 480.128 478.275
> 478.518 567.144 572.191 563.255 556.281
> 1 GiST Index 25.249 289.255 293.143 28.838 289.002 291.296 28.336 28.597
> 26.947 295.394 297.556 293.988 299.760
> 2 X PreSort Index 8.829 440.861 445.630 37.960 439.564 440.535 37.594
> 37.979 37.741 386.662 384.635 385.166 388.832
> 3 Morton PreSort Index 16.885 421.999 413.427 77.002 422.939 412.130
> 77.415 75.102 76.056 416.205 446.599 410.614 434.613
> 4 Hilbert PreSort Index 16.824 417.539 415.962 56.583 421.226 414.553
> 56.320 55.600 55.338 416.639 421.243 418.550 417.094
> The number of shared buffer hits are far bigger than the original one. But
> what confuses me is that the execution times are worse. I am trying to
> figure out why this happened.
> What's more, I am not very clear about the relationship between query
> performance and the number of shared buffer hits.
>
> If you have any questions or suggestions, please let me know.
>

I see you are considering there a self-join query for your tests: I'd
follow the good suggestions you already received considering a simpler
query and just with
one operator &&, i.e. something like

                SELECT * FROM table WHERE geom_column && BOX2D('...')

and check the amount of buffers hit/read in each plan obtained in the three
distinct contexts (no index ÷ GiST index no presort ÷ GiST index with
presort).

Also, I'd check in detail what's stored in the index obtained with or
without the presorting. In PostgreSQL 14dev they added the support for
running the
pageinspect
<https://www.postgresql.org/docs/14/pageinspect.html#id-1.11.7.32.9> functions
even on GiST indexes. In this way you can check which information is
actually stored in each page of the built indexes, and check
for any big difference in the nodes' structures.

Giuseppe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210727/bd5b22f2/attachment-0001.html>