[postgis-users] Performace problems with 250.000.000 points

Tue Jul 24 06:26:11 PDT 2007

Hi list,

working with an huge data table I've massive performance problems.

Here's the scenario:
There is one table trees, containing about random 250.000.000 points
within the square (x = 100000 / y = 100000). SRID is -1. A second table
land_parcel contains about 900.000 polygons (1 ring with 17 points).
These polygons are uniformly distributed within the square from lower
left to upper right.

The task is to read all the trees within a specific land_parcel.
Something like
	select x(t.geom), y(t.geom) 
	from land_parcel as l, trees as t 
	where (l.lp_id = 123456) 
	and (l.geom && t.geom) and Contains(l.geom, t.geom)

The problem is that the execution of this query takes ages. Although the
result set is very small (approximately 25 trees). 
In the first attempts I measured an average computation time of 366 s
for selecting one random land_parcel. The maximum computation time I
measured was about 700 s. In the meantime I've got better results (see
below).

My assumption is that the bottle neck is the geo index. Reading the 100
first land_parcels in a loop I got the result for the 1st land_parcel
after 12 s. The times for the following land_parcels are decreasing (7
s, 6 s, ... 0.5 s). 
There is an other thing, too. If you once picked a random land_parcel
the computation for a neighbour land_parcel is much faster.

PostgreSQL is running on a windows machine (Win 2003 Server) with an
Intel Core2 CPU 6700, 2.66 GHz, 2 GB RAM and a 1.1 TB RAID 5. CPU usage
is low, I/O traffic is height, mem usage is about 80 MB. 

I've already walked through Mark's tuning guide. Here are some snippets
of the configuration file:
	shared_buffers = 64MB
	effective_cache_size = 196MB
	random_page_cost = 4
I've altered the tables:
	ALTER TABLE trees ALTER COLUMN geom SET STATISTICS 1000;
	ALTER TABLE trees ALTER COLUMN tr_id SET STATISTICS 1000;
	ALTER TABLE land_parcel ALTER COLUMN geom SET STATISTICS 1000;
	ALTER TABLE land_parcel ALTER COLUMN lp_id SET STATISTICS 1000;
I've done several vacuum analyzes.
I've done a reindex on trees and land_parcels.
I've re-clusterd the table trees
	CLUSTER trees_index ON trees;

The clustering was successful. The average time for one land_parcel is
9.7 s. But this is still to slow. Running the scenario in ArcSDE /
Oracle the average computation time is approximately 1.4 s.

Has anybody still an idea? Or are 250.000.000 points just too many
items?
Thanks

Andreas

---

--
-- Query plan
--
Nested Loop  (cost=99.47..5072.48 rows=19 width=21)
  Join Filter: contains(l.geom, t.geom)
  ->  Index Scan using land_parcel_pkey on land_parcel l
(cost=0.00..9.25 rows=1 width=317)
        Index Cond: (lp_id = 123456)
  ->  Bitmap Heap Scan on trees t  (cost=99.47..5044.38 rows=1250
width=21)
        Filter: (l.geom && t.geom)
        ->  Bitmap Index Scan on trees_index  (cost=0.00..99.16
rows=1250 width=0)
              Index Cond: (l.geom && t.geom)

--
-- Table: trees
--
CREATE TABLE trees
(
  tr_id integer NOT NULL,
  breed_id character varying(10) NOT NULL,
  height double precision,
  crown_diam double precision,
  bhd double precision,
  volume double precision,
  plant_date character varying(10),
  geom geometry,
  CONSTRAINT trees_x_pkey PRIMARY KEY (tr_id),
  CONSTRAINT enforce_dims_geom CHECK (ndims(geom) = 2),
  CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
'POINT'::text OR geom IS NULL),
  CONSTRAINT enforce_srid_geom CHECK (srid(geom) = -1)
) 
WITHOUT OIDS;
ALTER TABLE trees OWNER TO postgres;

-- Index: trees_breed_ididx
CREATE INDEX trees_breed_ididx
  ON trees
  USING btree
  (breed_id);

-- Index: trees_heightidx
CREATE INDEX trees_heightidx
  ON trees
  USING btree
  (height);

-- Index: trees_index
CREATE INDEX trees_index
  ON trees
  USING gist
  (geom);

-- Index: trees_plant_dateidx
CREATE INDEX trees_plant_dateidx
  ON trees
  USING btree
  (plant_date);

--
-- Table: land_parcel
--
CREATE TABLE land_parcel
(
  lp_id integer NOT NULL,
  owner_id character varying(20) NOT NULL,
  numerator integer NOT NULL,
  denominator integer NOT NULL,
  geom geometry,
  CONSTRAINT land_parcel_pkey PRIMARY KEY (lp_id),
  CONSTRAINT enforce_dims_geom CHECK (ndims(geom) = 2),
  CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
'POLYGON'::text OR geom IS NULL),
  CONSTRAINT enforce_srid_geom CHECK (srid(geom) = -1)
) 
WITHOUT OIDS;
ALTER TABLE land_parcel OWNER TO postgres;

-- Index: land_parcel_index
CREATE INDEX land_parcel_index
  ON land_parcel
  USING gist
  (geom);

-- Index: land_parcel_owner_ididx
CREATE INDEX land_parcel_owner_ididx
  ON land_parcel
  USING btree
  (owner_id);