[postgis-users] Large dataset help

Gregory S. Williamson gsw at globexplorer.com
Wed Mar 23 04:11:45 PST 2005


Paul,

A search of the archives of this mailing list will provide some previous dialogue on this subject ... storing images in a database has some advantages -- the invaluable benefits of ACID for one, and uniform backups for a somewhat related issue. The problem seems to be more in the operational end -- databases are very good at certain things, and streaming large amounts of image data is not necessarily something they excel at. 

Let me give a crude example. I worked on applications at TRW that stored image data in Informix BLOBs; Informix uses a fixed page size (generally 2k, depending on the operating system), but with BLOB objext you can define the size of the page. We had TIFF images that were all 10-15k in size, so we size out BLOBs at 15k which led to very streamlined retrieval -- a single disk read (well, Informix using raw disk storage) would get an entire image in one operation. If we had stored them all in normal pages it would have taken ~7 operations to get the same BLOB. 

Postgres uses a very different method of disk access, which changes the picture, more so if you have a database with more transient data.

Our (e.g. my current employer and by extension, me) current system puts the images onto servers which are superb at handling lots of simultaneous requests for large amounts of data; the database stores metadata (acquisition date, owner, etc) and spatial data about the image, and we let the disk cache and filers do the bulk of the work in the access; the database just tells us what to retrieve.

The precise life cycle of your data might make an enormous difference in the "best" solution. We have lots of relatively static data (some of which is really nonchanging, some of which is a replacement of older data); if this data were changing more frequently the required careful syncronization betweenm disk/image and disk/databse might be more problematic and I'd lean towards an "all-in-the-db" solution. System security and access rights might also play a part in your analysis.

The # of simultaneous connections is also an issue to consider. Some standard disk storage systems will collapse if you hit them with 50-100 requests for different data (obviously, using a database doesn't make this problem go away bit is does the ground rules).

Backup strategies might also come into play here -- what are the impacts on the whole system if you have to replace some 20% of your data ?

Vector data is a thing that so far we put exclusively into postGIS, but we don't have to play with DEM data or other such datasets much; some solutions other than a database might be worth considering if there are such large point data sets.

Sorry for such a meandering post, but this is not a clear-cut issue .

My gut level feeling is towards putting iamgery outside of the database and dealing with syncronizing it and data about it seperately. 

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From:	Paul Scott [mailto:pscott at uwc.ac.za]
Sent:	Wed 3/23/2005 1:12 AM
To:	PostGIS list
Cc:	
Subject:	[postgis-users] Large dataset help
I have just got a "proof-of-concept" project that will eventually
encompass manipulating and displaying over 350 terabytes of data over
the next four years.

Most of the data will be remotesensing data, i.e. Rasters with a few
(~25 TB) shapefiles as well as attribute data.

I was wondering if it would be a good idea, and of course if its even
possible, to store most of that data as Binary Objects (BLOBS) in
postGIS? If not, what suggestions could we come up with?

This is by far the largest project that I have ever worked on with
postGIS, so any help would be greatly appreciated.

Basic overview of the setup:

1. Server cluster each with 10x750GB Network storage devices
2. Currently running Windows and ARCIMS - This is going to go!
3. I have started building a linux distro for this process

--Paul



_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

!DSPAM:42411347162348882574613!







More information about the postgis-users mailing list