[postgis-devel] SoC] GSoC 2021 - Week 10 Report - Implement pre-sorting methods before GiST index building

Giuseppe Broccolo g.broccolo.7 at gmail.com
Wed Aug 18 07:28:37 PDT 2021


Hi Han,

Thank you for the detailed information. The presorting function in
PostgreSQL considers as well just the first 32 bits of Morton hashing, so
they are probably affected by the same issue as well. I didn't find any
study in this sense, like you did here in order to understand the drop in
performance in using the index once it's built. Of course, you are focusing
on the lack of performance, which is ~0.03ms vs ~0.05ms. I think that even
the difference in hitting few additional pages can bring to this lack,
which looks big with this sub-ms query but maybe it's more contained for
larger source datasets. What is important to see is that the index ensures
high performance if compared to the execution without any index (~13ms), so
a factor ~1/500 is possible considering both GiST indexes with or without
any pre-sorting. On the other hand, pre-sorting allows build time for the
index which is ~1/3 of the case without the presorting!

Anyway, if you have time it's worth trying a full 32-bit hashing for the
presorting. Keep us updated! It's sad to know that the next will be your
last week for this GSoC - which doesn't mean you are allowed to disappear
from the community! ;)

Thank you Han for the fantastic job you are doing!

Giuseppe.

Il giorno mar 17 ago 2021 alle ore 18:04 Han Wang <hanwgeek at gmail.com> ha
scritto:

> Hi all,
> I am here to share with you my Week 10 report. You can also find it at [1]
> Coding Week 10 (9th August - 15th August)
> <https://trac.osgeo.org/postgis/wiki/ImplementSortingMethodsBeforeGistIndexBuilding#CodingWeek109thAugust-15thAugust>
>
> *Coding Phase *:
>
>    - Do more traversal tests
>    - Fix the issue of gist_page_items
>
> *Plans for next week*:
>
>    - Submit the final evaluation
>    - Plan for a 32 bit hash function
>    - Finish the documents
>
> I have updated the document[2] for more tests. The pre-sort function works
> normally. But from this test, I think the current implementation of the
> fast index building method with pre-sorting may cause a loss of
> performance.
> The pre-sorting function entry in Postgres does not check the leaf
> elements of a GiST index. And the first 32-bit of a 64bit Morton/Hilbert
> hash code as a datum key may reduce the precision significantly.
> From my perspective, a 32 bit hash function may be necessary next. If you
> have any questions or suggestions, please let me know! You can reach me at
> the #postgis channel in matrix.
>
> [1]
> https://trac.osgeo.org/postgis/wiki/ImplementSortingMethodsBeforeGistIndexBuilding
> [2]
> https://docs.google.com/document/d/1m4oxBAsKCyjAnYmkCmQ0X_ltiid5tliFwF3rtdzlKsc/edit?usp=sharing
>
> Best regards,
> Han
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210818/9f333afd/attachment.html>


More information about the postgis-devel mailing list