[postgis-devel] [SoC] GSoC 2021 - Final Report: Implement pre-sorting methods before GiST index building

Regina Obe lr at pcorp.us
Sun Aug 22 23:28:55 PDT 2021


Great work Han. It was great having you in the meeting and I hope you like PostGIS enough to consider contributing in the future. :)  

You’d be a great asset to our team.

 

Thanks,

Regina

 

From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] On Behalf Of Han Wang
Sent: Monday, August 23, 2021 12:55 AM
To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>; soc at lists.osgeo.org
Subject: [postgis-devel] [SoC] GSoC 2021 - Final Report: Implement pre-sorting methods before GiST index building

 

Hi all,

 

Here is my final report about my GSoC project: Implement pre-sorting methods before GiST index building.

 

Abstract

GiST(Generalized Search Tree) is a generalization data structure of a variety of disk-based height-balanced search trees. Under the high-level API of GiST, structures like b-tree, r-tree can be implemented for data management. PostgreSQL defines a set of process function APIs for elements of the GiST index. Only with these function implementations can a data type be indexed and managed by a GiST structure. In large data scenarios, pre-sorting a batch of data fetched in memory may be a local approximation to the global sorting method. Recent PostgreSQL patch shows that it should speed up the build of a GiST index after some pre-sorting of the data which needs to be indexed. In one fork, the author replaces the GIST_OPTIONS_PROC with GIST_ORDER_PROC to try to define an order for data fetched in memory to sort in order to speed up the subsequent index building process. And I implemented pre-sorting methods in z-order pattern and Hilbert order pattern, Alos tested and compared pre-sorting methods on various data.

 

Links

* Code Base: https://github.com/postgis/postgis/pull/619

* Wiki: https://trac.osgeo.org/postgis/wiki/ImplementSortingMethodsBeforeGistIndexBuilding

* Test Results and Performance Comparison: https://docs.google.com/document/d/1_mY_F2hPDk3vmXH5PPp2z9BuQWt-ZMORk6KxtdVQ3HY/edit?usp=sharing

 

Conclusion

* The Morton/Hilbert hash function for pre-sorting can accelerate the process of index building process and reduce the time-consuming to one-third to one-fifth of the original

* The query operation for different data in the same area demonstrates more stable performance

* The query performance of pre-sorting is about 30% slower than the original

 

Future Work

* Try to figure out the reason for the slowness of the pre-sorting index

* Implement a fast Morton/Hilbert hash function for n-dimension geometry objects

 

Participating in the GSoC is a great experience for me. And thank you very much for my mentors, GSoCstaff and the community.

 

Best regards,

Han

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210823/5ef301a2/attachment.html>


More information about the postgis-devel mailing list