[postgis-users] point-in-polygon SQL sentence performance

pere roca peroc79 at gmail.com
Mon Jun 30 09:40:30 PDT 2008



   dear Regina,

   first, thanks to take some time.
   yes, it's a little complicated to explain; I will try again:
  -My webapp lets user insert point data; all users insert this data in the
same table (point)
  -I have several polygon tables in postGIS (to cross with point data)

I thought about:

Updating an unique,exclusive user_point table with fields: geom, specie,
genus... and code. All fields except the code are extracted from point table
using $user as identifier ($user comes to PHP from user identification). The
code for each point is extracted using CONTAINS and && sentence (see the PHP
script) from polygon table. 

A user_polygon table will have the geom, id... and fields like
number_of_records (see SET numtax below), number_genera that require
counting how many records fall inside EACH polygon. This fields are filled
with the COUNT sentences taking into consideration the previous point table
updating. 

I've tried the sentence below, following your code:
  UPDATE user_polygon SET numtax = (select u.cnt from (SELECT
COUNT(user_point.code) as cnt, user_point.code  FROM user_point  GROUP BY
user_point.code) as u  WHERE u.code = user_polygon.code);";

The sentence inside (select u.cnt...) goes very very fast, but "loses"  a
lot of time (also takes 1min 20sec) comparing WHERE u.code =
user_polygon.code. I think this is the problem, but there is some solution
to this? isn't the main idea good? everything goes fast except this last
part. 

After executing all the SQL sentences, data from  user_point and
user_polygon is deleted (but not the table!).

Some answers:
-"1) Are you using inherited tables and having a different table for each
user?". No inherited tables (was wrong using ONLY parameter). A user = a
point and a polygon table (info deleted each time session expires) 
-"How big is your user_polygon table". My testings are based on a 20 Mb
"Toast table size", 544 Kb Table size. It has 1991 records.

  Thanks again
  
  Pere Roca

Obe, Regina     DND\MIS wrote:
> 
> Pere,
> 
> Somehow I think your whole code is needlessly complicated, but I still
> don't have a clear idea what you are trying to do and looking at your
> code is actually more confusing than anything.  Maybe try to restate
> exactly what you are trying to do minus the code.  
> 
> I'm guessing you are trying to do this (by the way wrapping a (SELECT
> ...) like that is very non-standard)
> 
> INSERT INTO user_polygon (userid, polygon_code,the_geom) 
> SELECT 
> distinct(polygon_code),the_geom FROM polygon where polygon."."the_geom
> &&
> GeometryFromText('".$row['astext']."',4326);
> 
> 
> 1) Are you using inherited tables and having a different table for each
> user?  I'm assuming that is why you are doing
> 
> UPDATE ONLY 
> 
> rather than UPDATE  (UPDATE ONLY really only makes sense if you are
> using inherited tables and don't want the update to cascade to child
> tables, but then if its inherited - its still silly since you are just
> updating child tables anyway)
> 
> 
> 2) Where is that $user coming from.  Make sure you know where that $user
> is coming from and you properly sanitize it otherwise its a prime target
> for SQL Injection attack.
> 
> 3) How big is your user_polygon table?  Speed would depend on that
> 
>  - this particular statement you have is inefficient - because it is
> either doing a correlated subselect for every record or its updating
> every record to the same value. Since you are not prefixing things - its
> hard for me to tell what exactly this is doing.  This I suppose would
> make sense if you are creating a different table for each user
> 
> UPDATE ONLY user_polygon SET numreg = (SELECT COUNT (code) FROM
> user_points
> WHERE name = polygon_code);
> 
> but its generally more efficient to write a sub select
> 
> UPDATE user_polygon SET numreg = u.cnt
> FROM (SELECT COUNT(up.code) as cnt, up.name
> 		FROM user_point up
> 		GROUP BY up.name) u
> WHERE u.name = user_polygon.polygon_code;
> 
> 
> 3) I have no idea what the point of the first bounding box insert you
> have is for 
> since ST_intersects, ST_Contains already have bounding box checks in
> them.  So I suspect this step you have is is not necessary
> 
> For this part
> 
> SELECT
> distinct(polygon_code),the_geom FROM polygon where polygon."."the_geom
> &&
> GeometryFromText('".$row['astext']."',4326)
> 
> 
> 4) Again this statement
> UPDATE ONLY user_polygon SET userid ='$user';
> 
> Is updating every record in user_polygon to the current user.  So if you
> are looping thru a user set which I can only guess you are, this
> statement is overwritting the previous updates.
> 
> If again you have one table per user, then this is fine - but its
> cheaper to do everything in your insert rather than doing an update
> after the fact.  Updates tend to be slower than inserts.
> 
> Hope that helps a bit,
> Regina
> 
> 
> 
> 
>  
> 
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of pere
> roca
> Sent: Monday, June 30, 2008 4:25 AM
> To: postgis-users at postgis.refractions.net
> Subject: [postgis-users] point-in-polygon SQL sentence performance
> 
> 
> hi!
> 
> I wanna make a point-in-polygon operation but takes looong time (about 4
> minutes!) and the CPU consuming of the server
>  becomes huge when performing this operation for 15000 points /
> polygons:
> 
> I need to count how many user points fall in each polygon and perform
> some
> operations based on that. 
> We initially have a point_table that stores multi-user point data. We
> extract from there only this-user-data and create new tables:
> user_points
> and user_polygon. To the first one we will insert the code of the
> polygon in
> which it falls; counting how many many times the same polygon_code is
> repeated, we will calculate number-points_in_polygon...
> The important code is commented below (calling SQL from PHP):
> 
> //we use the points bbox to narrow the future spatial queries (will only
> select polygons falling inside this bbox; from 1991 initial polygons,
> only
> 85 are from now on used); it is useful when points are close each other
> (if
> there is an outlayer then bbox is not useful...)
> 
> $extent="select astext(extent(points_table.the_geom)) from points_table
> where userid='$user'";
> $extent_result=pg_query($extent); 
> 
> while ($row = pg_fetch_array($extent_result, NULL, PGSQL_ASSOC)) 
> {
> //creates a NEW polygon table that will store only polygon code and
> polygon
> geometry that fall INSIDE the point's bbox
>    $poligon_sql2="INSERT INTO user_polygon (polygon_code,the_geom)
> (SELECT
> distinct(polygon_code),the_geom FROM polygon where polygon."."the_geom
> &&
> GeometryFromText('".$row['astext']."',4326));
> UPDATE ONLY user_polygon SET userid ='$user';
> 
> //we previously created a new user point table, where we insert some
> info.
> The table is created because we are taking data from a multi-user table
> and
> now we want a single user-->a single point table
> 
> INSERT INTO user_points (genus, species, name, code, usuario, the_geom)
> (SELECT genus, specie, codigo, oid, userid, the_geom FROM points WHERE
> userid = '$user');
> 
> //in this user POINT table we set the code from the polygon's table
> where
> the point falls inside; 
> //I expected this one to be the "big" operation , but it's not (takes
> "only"
> 1 min 18 sec)
> 
> UPDATE ONLY user_points SET code = (SELECT polygon_code FROM
> user_polygon
> WHERE user_polygon.the_geom && user_points.the_geom AND CONTAINS
> (user_polygon.the_geom,user_points.the_geom));
> 
> //updating user polygon table, counting how many times do we have a
> polygon
> code in our POINT table --> number of points in each polygon
> // this updating operation takes 1 min 20 seconds! 
> 
> UPDATE ONLY user_polygon SET numreg = (SELECT COUNT (code) FROM
> user_points
> WHERE name = polygon_code);
> 
> // this updating operation also takes about 1 min 20 seconds! 
> 
> UPDATE ONLY user_polygon SET numtax = (SELECT COUNT (genus) FROM (SELECT
> DISTINCT (genus) FROM user_points WHERE code = polygon_code) AS foo); 
> 
>    All important data is indexed; Any ideas to get better performance?
> 
>    thanks!
>  
>    Pere Roca
>    EDIT project (http://edit.csic.es/edit_geo/prototype/edit.html)
> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/point-in-polygon-SQL-sentence-performance-tp181908
> 54p18190854.html
> Sent from the PostGIS - User mailing list archive at Nabble.com.
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> 

-- 
View this message in context: http://www.nabble.com/point-in-polygon-SQL-sentence-performance-tp18190854p18199519.html
Sent from the PostGIS - User mailing list archive at Nabble.com.




More information about the postgis-users mailing list