[postgis-users] geomunion revisited....

Tue Nov 13 09:02:11 PST 2007

As I mentioned in my previous post, buffer has to pull all geometries 
into memory at once and process them all together.  It creates a lot of 
internal data structures in the course of processing.  I'm not that 
suprised that it doesn't work on 30K geometries. Or, there may well be a 
memory leak - our test cases don't actually include one with 30K geoms  
8^)  (Lee, I'd be interested to see if this works in JTS - can you send 
me your dataset as a shapefile?)

I can think of a couple of things to do:
- Get the new version of JTS (from me) and use the CascadedUnion class
- Use a simple grid index on your data to partition it, union each 
partition separately, and then union the results.  This process would 
look something like: 
    - pick an appropriate grid size (say 10x10 ?)
    - compute the interior point of each geometry, determine which grid 
cell it lies in, and save this in a new column
    - union all geoms in each grid cell together
    - union the result together

Note: I haven't actually implemented this approach, but it's the only 
way I can see to reduce the amount of geometry you are working with at 
any given step.

Lee Keel wrote:
>> -----Original Message-----
>> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-
>> bounces at postgis.refractions.net] On Behalf Of Kevin Neufeld
>> Sent: Tuesday, November 13, 2007 12:40 AM
>> To: PostGIS Users Discussion
>> Subject: Re: [postgis-users] geomunion revisited....
>>
>> Also, I think you may be after the SQL syntax UNION ALL, not UNION, if
>> you are simply after concatenating the resultset from table 2 to table 1.
>>
>> select simplify(...
>> UNION ALL
>> select simplify(...
>>
>> Also, I too would be curious how buffer(collect(...), 0) fares for you
>> instead of geomunion() for your final 5 hour query.
>> Since you're doing this all at once anyway, it may work.  Geomunion() is
>> an aggregate for the two geometry method.  Which means that it will
>> slowly add one geometry at a time to an every growing resulting
>> geometry.  Buffer(collect(...), 0) will decontruct the collection once,
>> and rebuild it once to a union-ed geometry.
>> -- Kevin
>>
>> Josh Livni wrote:
>>     
>>> I think running buffer(bunch_of_geoms,0) rather than
>>> geomunion(bunch_of_geoms) might be faster.
>>>
>>> That said, still seems it takes a bit long.  If it were me I'd try
>>> playing around by first creating a table with your results, eg:
>>>
>>> '''create table simple_buffers as (
>>> select simplify(buffer(simplify(the_geom, 5), 400), 150) as the_geom
>>> from table1
>>> union
>>> select simplify(buffer(simplify(the_geom, 5), 400), 150) as the_geom
>>> from table2); '''
>>>
>>> then you could create a GiST index on the_geom of your new table, run
>>> vacuum analyze on it, and then try the geomunion and buffer to test.
>>> With 32k+ rows, depending on how much they overlap, that might help a
>>> little, or at least help potentially isolate the issue if it does take
>>> as long.
>>> Also I assume you've modified your postgres config away from the
>>> defaults to take advantage of more memory on your machine, etc.
>>>
>>>
>>>
>>>  -Josh
>>>       
> [Lee Keel] 
>
> Well, I have played with this some more and here is what I have found.
>
> I tried using the buffer(collect(the_geom), 0) on the largest table (about
> 30K rows) and I get the following error:
>
> NOTICE:  St9bad_alloc
>
> ERROR: GEOS buffer() threw an error!
> SQL state: XX000
>
>
> This entire table is only 5872KB but when I monitor this buffer process the
> memory gets over 1.7GB before it finally crashes out.  Even if you were to
> triple the memory I can't see where it would ever get to 1.7GB.  NOTE: When
> monitoring the process of doing just the collect(the_geom) the memory
> footprint gets up to about 26-28MB and takes about 2.5 minutes.  This leads
> me to think there might be a memory leak in the buffer().
>
>
> I tried using a test table.  This new table is 3704KB.  But I realized that
> a gist index is not going to buy me anything because it is still going to
> pull the entire table.
>
>
> I have listed my config settings below.  I am not sure how much more I can
> tweak them since I am running on 64bit Vista.  I have had a few posts in the
> last week that have basically said that the windows version of this can't be
> tweaked much more due to constraints by windows.
>
> shared_buffers = 256MB
> temp_buffers = 32MB
> max_prepared_transactions = 100
> work_mem = 16MB
> maintenance_work_mem = 256MB
> max_stack_depth = 3MB
>
> Thanks for everyone's help and attention to this.
> Lee
>
> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>   

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022