[postgis-users] A PostGIS-Raster data proposal

Stephen Marshall smarshall at wsi.com
Tue Oct 31 12:32:38 PST 2006


I've extended my testing of server-side access to include testing large 
objects (LO's) as well as TOAST.

I found that LO's have similar performance characteristics to TOAST.  On 
average, it takes tens of microseconds to access a 1000 bytes chunk of 
data from a much larger LO.  For both TOAST and LO's, the access time 
varied around close to 40 microseconds per access, compared to around 4 
microseconds to do a similiar access from a local file using fseek and 
fread. 

The LO and flatfile tests differed from the TOAST tests in that LO's and 
flatfiles must be opened before access.  Opening the LO and flatfiles 
were done once prior to all the reads, and were not included in timing 
tests.  Open a file general takes around 100 microseconds and opening a 
LO takes several hundred microseconds, so the overhead is large relative 
to the time to read a small chunk of data.  It will be important to 
avoid designs the require opening an LO or file more than once.

Determining the size the data also has performance characteristics I 
have not yet tested thoroughly, but which may be important.  My tests 
require knowing the total data size, which may not necessarily be needed 
for raster-in-DB (for example the data formats themselves may carry 
sufficient size information).  However, finding the size of the TOAST or 
LO can be relatively expensive.

I am looking into ways to find the size of the TOAST object, but my 
current method seems to require reading all the data into memory.  Since 
this subverts the whole point of seeking within the TOASTed data, I'm 
not happy with this.  I'm looking into how to just scape the length from 
the data from the bytea, which I believe should be stored in the first 4 
bytes of a bytea (variable length field).  If anyone knows different or 
better ways to find the length of a bytea variable, please let me know.

Finding the size of an LO seems to require seeking to the end of the LO 
and using lo_ftell to return the seek position.  This takes more time 
than a single 1000 byte read (~ 65 microseconds).  Anyone know a faster 
way to find the size of an LO?

Finally, these tests (still) do not include the time it takes to 
transfer the data back to the database client, thus all the accesses are 
inherently acces to the local system only.  I'll be expanding the tests 
to include remote access soon.

Steve Marshall

Marshall, Steve wrote:

>Per Frank Warmerdam's suggestion, I've done a test of access performance
>using internal  postgresql toast functions vs. normal file seeking.  
>
>The test involved seeking in a toasted bytea column containing
>approximately 20 MB of binary data.  The TOAST column was set to
>EXTERNAL storage (i.e. in separate TOAST table, but not compressed).
>The test involved seeking through the data sequentially in chunks of
>1000 bytes, and measuring the time to retrieve each chunk.  The code to
>do this was encapsulated in a postgresql server-side function and
>invoked through SQL.  I restarted the PostgreSQL server before the test
>to avoid having any cacing of data in shared memory, which could
>artificially speed up the data access.
>
>As a comparison, I also wrote a program that would do the equivalent
>data access from a file.  The file contained the same data as the bytea
>column, and the access was replaced with fseek and fread calls.
>
>The results of the test were that toast seeking was about 10 times more
>expensive than seeking in a local file.  Each local file access averaged
>in microseconds, while toast-seeks averaged 10's of microseconds.  The
>worst case file seeking was in milliseconds, while worst case
>toast-seeking was in 10's of milliseconds.  The absolute values for
>toast-seeks don't seem too bad to me, but it is a bit worrying that the
>values are an order of magnitude worse than local file I/O.
>
>I did play around with some parameters in the DB test.  Changing the
>chunk size did not make a big difference, but it got a small boost by
>setting it to the toast chunk size (1994 bytes). I did not vary the test
>to do seeking around randomly instead of sequentially.  This might give
>a boost to the DB implementation due to caching; I'm not sure what this
>would do to file I/O.
>
>I also have not explored the performance of repeated access to the same
>data segments.  Here PostgreSQL data caching might help DB access
>relative to file I/O.
>
>There are still more things to do here, but I thought I'd share some
>early results.  I'm happy to provide the code and SQL definitions for
>the test, if anyone else is interested in it.
>
>Steve Marshall
>  
>




More information about the postgis-users mailing list