[postgis-devel] gzip support for ST_AsMVT

Paul Ramsey pramsey at cleverelephant.ca
Mon Nov 4 08:15:59 PST 2019


Text as a column can be quite unhappy if you feed it with binary data (things like the printing code gets unhappy with the zero-byte parts, so you can end up with unexpected stops in printouts). 
I have no idea why text would have any advantage over bytea, since if you check the source, they are both just typedefs over top of varlena.
It would be possible to do a gzip(text) returns bytea, but you’d still have a bytea in the end, and it wouldn’t be nice and bidirectional, since text::bytea works, but bytea::text doesn’t return the original ascii encoding.

There shouldn’t be any conversion cost in a cast, since the memory structure of text and bytea is identical, it’s just re-typing a pointer.

I can add a gzip(text) => bytea very easily though (heck, you can, just create a new gzip function that looks exactly like the bytea one, but takes text as the argument, it’ll work fine.

ATB

P

> On Nov 4, 2019, at 1:49 AM, Yuri Astrakhan <yuriastrakhan at gmail.com> wrote:
> 
> Paul, one (possibly newbie) question - I noticed you used `gzip(bytea)`, and I read in [1] that text performs about 17% better (for read queries, not sure about in-memory operations). Does it make sense to add gzip(text) variant?   I concatenate ST_AsMVT() data with STRING_AGG, thus end up with a TEXT, and wonder if there is a conversion cost.  Thanks!!
> 
> [1] http://engineering.pivotal.io/post/bytea_versus_text_in_postgresql/ <http://engineering.pivotal.io/post/bytea_versus_text_in_postgresql/>
> On Mon, Nov 4, 2019 at 4:24 AM Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
> Paul rulez!!!! Thank you!!! :)
> 
> Experimentation ensues...
> 
> On Sun, Nov 3, 2019 at 9:29 PM Paul Ramsey <pramsey at cleverelephant.ca <mailto:pramsey at cleverelephant.ca>> wrote:
> Excuse me,
> 
> https://github.com/pramsey/pgsql-gzip <https://github.com/pramsey/pgsql-gzip>
> 
> P
> 
> On Sun, Nov 3, 2019 at 6:24 PM Paul Ramsey <pramsey at cleverelephant.ca <mailto:pramsey at cleverelephant.ca>> wrote:
> >
> > I could not find a gzip extension on the web, even though I swear one
> > already exists.
> > So, I wrote up one, it's quite a small piece of work, but hopefully it serves.
> >
> > https://github.com/pramsey/psql-gzip <https://github.com/pramsey/psql-gzip>
> >
> > ATB,
> >
> > P
> >
> > On Sun, Nov 3, 2019 at 11:57 AM Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
> > >
> > > OSM -> PostgreSQL is done by Imposm 3, which updates database tables on the daily/hourly/minute basis. Every time it runs, it generates a list of changed tiles.  I do not know what process they use for it -- from my perspective, I simply get the list of updated tiles on zoom 14 as a file, and I could use it to regenerate cached tiles or purge them from Varnish.  Perhaps Imposm maintainers could find a good use for that functionality?
> > >
> > > [1] https://imposm.org/docs/imposm3/latest/tutorial.html#expire-tiles <https://imposm.org/docs/imposm3/latest/tutorial.html#expire-tiles>
> > >
> > > On Sun, Nov 3, 2019 at 2:41 PM Martin Davis <mtnclimb at gmail.com <mailto:mtnclimb at gmail.com>> wrote:
> > >>
> > >> Good background to know.  Quite a bit to grok there!
> > >>
> > >> One question: does any part of that toolchain regenerate tile cache subsets depending on detection of feature change?  And if so, would it be useful to have a DB function which can determine the set of tile ids that need to be refreshed (i.e. by mapping a (set of or single) geometry (or envelopes) into a set of tile ids?
> > >>
> > >> On Sun, Nov 3, 2019 at 11:00 AM Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
> > >>>
> > >>> Martin, I am working on improving OpenMapTiles tooling [1] - the ultimate goal is to have tiles generated in real time from the up-to-date OSM data, and serve them directly to user's browser via some caching layer (i.e. Varnish). The tools already contain postserve - a simple python server that queries for MVT tiles (no compression yet, but can be easily added)
> > >>>
> > >>> The other task is tile pre-generation using tilelive-copy (nodejs) - I wrote a tilelive-pgquery plugin [2] that queries PG for the tile, compresses it, and passes it on to tilelive-copy for storage.
> > >>>
> > >>> [1] OpenMapTiles tools - https://github.com/openmaptiles/openmaptiles-tools <https://github.com/openmaptiles/openmaptiles-tools>
> > >>> [2] tilelive-pgquery - https://www.npmjs.com/package/tilelive-pgquery <https://www.npmjs.com/package/tilelive-pgquery>
> > >>>
> > >>> On Sun, Nov 3, 2019 at 12:55 PM Martin Davis <mtnclimb at gmail.com <mailto:mtnclimb at gmail.com>> wrote:
> > >>>>
> > >>>> Great to hear that ST_AsMVT is useful.
> > >>>>
> > >>>> The other PostGIS capability that is useful for web spatial applications is the (recently enhanced) ST_AsGeoJSON.  This should also be gzipped over the wire.  So this suggests a modular gzip capability would be more useful.
> > >>>>
> > >>>> If this isn't provided in Postgres in some way (now or in near term) perhaps we should just add a ST_Gzip function to PostGIS.
> > >>>>
> > >>>> Out of curiosity, what platform do you use for your external gzipping layer?
> > >>>>
> > >>>> On Sun, Nov 3, 2019 at 8:29 AM nyurik <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
> > >>>>>
> > >>>>> The amazing ST_AsMVT() has two common usage patterns:  copy resulting MVTs to
> > >>>>> a tile cache (e.g. .mbtiles file or a materialized view), or serve MVT to
> > >>>>> the users (direct SQL->browser approach).  Both patterns still require one
> > >>>>> additional data processing step -- gziping.
> > >>>>>
> > >>>>> Thus, rather than having a horizontally scalable db plus a simple IO-bound
> > >>>>> SQL->Web or a SQL->store process, one has to add a relatively CPU-intensive
> > >>>>> gzipping layer.  This is especially relevant if I try to create a PG table
> > >>>>> with the pre-generated tiles - I must use an external data compression
> > >>>>> process to retrieve a tile, gzip it, and store it back, instead of running a
> > >>>>> single query for copying all tiles.  My cursory look at the tile sizes
> > >>>>> indicate gzipping shrinks MVTs 50% to 300%.
> > >>>>>
> > >>>>> Note that a similar CPU-intensive step - creating MD5 tile hashes for a more
> > >>>>> efficient storage - can be easily done with PG's `md5()` function, whereas
> > >>>>> `gzip()` doesn't appear to exist.
> > >>>>>
> > >>>>> I would like to propose two possible solutions:
> > >>>>> * Implement ST_AsMVT(..., compress) parameter - NULL=no compression,
> > >>>>> 0-9=compression level.
> > >>>>>    PROs:  adds just the required functionality to where it is needed (YAGNI
> > >>>>> principle), does not require ungzip yet (ST_AsMVT is a one way function
> > >>>>> without the corresponding MVT->Table method)
> > >>>>>    CONs: less generic (unusable for non-MVT usage)
> > >>>>> * Implement gzip() or ST_gzip()
> > >>>>>    PROs:  a more generic approach not tied to MVTs
> > >>>>>    CONs:  logically implies the need of ungzip(), requires PG community to
> > >>>>> agree this functionality is needed
> > >>>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Sent from: http://postgis.17.x6.nabble.com/PostGIS-Dev-f3570762.html <http://postgis.17.x6.nabble.com/PostGIS-Dev-f3570762.html>
> > >>>>> _______________________________________________
> > >>>>> postgis-devel mailing list
> > >>>>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> > >>>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
> > >>>>
> > >>>> _______________________________________________
> > >>>> postgis-devel mailing list
> > >>>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> > >>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
> > >>>
> > >>> _______________________________________________
> > >>> postgis-devel mailing list
> > >>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> > >>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
> > >>
> > >> _______________________________________________
> > >> postgis-devel mailing list
> > >> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> > >> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
> > >
> > > _______________________________________________
> > > postgis-devel mailing list
> > > postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> > > https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>_______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20191104/b1bb80c2/attachment-0001.html>


More information about the postgis-devel mailing list