[postgis-devel] Issues about the GSoC project

Mon Jul 26 11:55:44 PDT 2021

The column names in the table are a little confusing :)

I don't have any thing I can think that would explain the execution time difference, so you'll probably need to gather more evidence. Comparing the call counts (gproff? callgrind?) might give a clue where the extra work is going in the sorted indexes. Actually, if your sort function was in fact "desorting" the inputs, that could explain both the extra shared buffer hits and the performance... like, if your whole index floats into memory then you'd expect lots of buffer hits, and the more work on the index, the more hits. So if you had a terrible index you'd get both lots of buffer hits and a lot of time spent.

Basically, it shouldn't take a lot of extra steps on a sorted index than on a normal gist index, so if there *are* a lot of extra index accesses... something is very awry in the sorting.

P.

> On Jul 25, 2021, at 10:21 PM, Han Wang <hanwgeek at gmail.com> wrote:
> 
> Hi Giuseppe and Hi Regina,
> 
> After checking the paper of GiST and implementation in Postgres. I think the query performance should be considered besides the building process. In the larger data test scenario, the building time of different indexes are similar because Postgres just hashes the tuples and sorts them and packs them into pages, building the tree index from bottom to up. With a bad hash order definition, the building process cannot detect the poor index query performance. So it is necessary to test the index query performance. I have tested the query performance with the `EXPLAIN` operator, using the sql scripts like other indexes in the `/regress`. But I am not familiar with PL/pgSQL, so I handle the log with some python scripts.
> 
> In this test, I focus on the buffer hits and execution time of different tasks of different index types including `No Index`, `Simple GiST index`, `X hash function`, `morton hash function` and `hilbert hash function`. And there are some results:
> Shared buffer hits:
>  Index	Create Time(ms)	<<	&<	&&	&>	>>	~=	~	@	&<|	<<|	|>>	|&>
> 0	No Index	0	18	18	18	18	18	18	18	18	18	18	18	18
> 1	GiST Index	25.249	40237	46085	3009	40297	46025	3009	3009	3009	41994	45295	41934	45355
> 2	X PreSort Index	8.829	443620	441235	3009	444501	440354	3009	3009	3009	441568	442503	440687	443384
> 3	Morton PreSort Index	16.885	447779	447446	4079	448669	446556	4079	4079	4079	445362	449428	444472	450318
> 4	Hilbert PreSort Index	16.824	446714	444058	3558	447600	443172	3558	3558	3558	446072	445394	445186	446280
> Execution time:
> 
> Index	Create Time(ms)	<<	&<	&&	&>	>>	~=	~	@	&<|	<<|	|>>	|&>
> 0	No Index	0	567.251	565.720	481.618	561.877	563.452	480.128	478.275	478.518	567.144	572.191	563.255	556.281
> 1	GiST Index	25.249	289.255	293.143	28.838	289.002	291.296	28.336	28.597	26.947	295.394	297.556	293.988	299.760
> 2	X PreSort Index	8.829	440.861	445.630	37.960	439.564	440.535	37.594	37.979	37.741	386.662	384.635	385.166	388.832
> 3	Morton PreSort Index	16.885	421.999	413.427	77.002	422.939	412.130	77.415	75.102	76.056	416.205	446.599	410.614	434.613
> 4	Hilbert PreSort Index	16.824	417.539	415.962	56.583	421.226	414.553	56.320	55.600	55.338	416.639	421.243	418.550	417.094
> The number of shared buffer hits are far bigger than the original one. But what confuses me is that the execution times are worse. I am trying to figure out why this happened. 
> What's more, I am not very clear about the relationship between query performance and the number of shared buffer hits.
> 
> If you have any questions or suggestions, please let me know.
> 
> Best regards,
> Han
> 
> On Thu, Jul 8, 2021 at 1:32 AM Giuseppe Broccolo <g.broccolo.7 at gmail.com> wrote:
> Hi Han,
> 
> Il giorno mer 7 lug 2021 alle ore 18:05 Han Wang <hanwgeek at gmail.com> ha scritto:
> Hi Regina and hi Giuseppe,
> 
> Thanks for your reply!
> 
> I am now checking the original paper and postgres's implement of gist. And now I think the query performance test after building gist index is necessary. Because as Darafei mentioned, the fast sorting building method may just pack the sorted tuples into pages regardless of which hash function it use. The correctness of index may be checked in the runtime query. At present, I am working on confirm the io access of hash functions.
> 
> Feel free to give suggestions or questions.
> 
> The correctness of the index should be covered by some of the regression tests in action for the GiST support in PostGIS - for instance, kNN searches etc. - for the moment I am really curious about the I/O access of the hash functions. This is something maybe it has not been checked even in PostgreSQL, and maybe it would be good to share the results with them as well.
> 
> Keep us updated, and thank you for your help!
> 
> Giuseppe.
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel