[postgis-devel] Issues about the GSoC project

Tue Jul 27 00:47:31 PDT 2021

Hi,

вт, 27 июл. 2021 г. в 00:01, Justin Pryzby <pryzby at telsasoft.com>:

> On Mon, Jul 26, 2021 at 10:44:42PM +0200, Raúl Marín wrote:
> > Hi,
> >
> > As Paul is mentioning, it appears that you are assuming that having lots
> of
> > shared buffer hits is good but it isn't. A shared buffer hit meant that
> you
> > needed to read a page and it happened to be cached in memory; but indexes
> > are useful because they avoid the need of reading unnecessary pages (in
> > memory or in disk), so an ideal sort would be one that put the data you
> are
> > looking in the same pages (so all the rows in the page are useful) and
> the
> > ideal index would be one that knew which pages as fast as possible (also
> > reading as few index pages as possible).
>
> I'm not following along closely, but I suggest to look at whether the
> index is
> clustered or not.
>
> SELECT correlation FROM pg_stats WHERE attname=.. AND tablename=..
>
> If correlation is low, an index scan may touch many pages of the table,
> even if
> it returns only a fraction of its tuples.  In addition to reading the heap
> more
> randomly than sequentually.


Correlation is not helpful here as it falls under chicken-and-egg problem:
we're trying to prove that hilbert sort is faster, and correlation will
show how perfectly hilbert-sorted the table is.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210727/2c56f0da/attachment.html>