[postgis-users] OT: Generic postgresql Q about LARGE number of tables

Puneet Kishor punk.kish at gmail.com
Sat Nov 27 13:41:34 PST 2010



Stephen Woodbridge wrote:
> Hi all,
>
> I'm helping a nephew with a thesis project in economics and finance. 
> We need to load historical stock quote data for like 8000 symbols 
> where each symbol will have something like 2,000-30,000 records. I 
> have the data in csv files so loading it with copy table will work fine.
>
> I'm wondering if people have any thoughts on organizing this. The 
> obvious options the come to my mind are:
>
> 1. 8000 tables

Might work.
> 2. 8000 tables inherited from a master table

Does not work. Pg basically starts croaking if more than a few inherited 
tables. I tried with 8400 tables, and it becomes unusable. For starters, 
every time you do anything, it wants to lock all the tables, and runs 
out of memory. So, you have to tinker with config settings to make it 
work. I basically was not able to make it worth with so many tables.
> 3. one huge table that will have something like 80M records in it 
> (est. as 8000 tables with on avg. 10,000 records) I would need to add 
> a column to every record based on the stock symbol to identify which 
> are which.

Contrary to what others might say, 80M also poses a problem for Pg, 
albeit in a different way. I had 120M rows, and was unable to 
experiment. In order for any select to be useful, you have to index the 
columns. If you want to try out different queries, try to optimize them, 
experiment with indexes, every operation becomes a chore. Creating and 
dropping the indexes becomes very time consuming.

If you had a query that was optimized, and you didn't have to tinker 
with various indexes, etc., yes, you could possibly be fine with 80M rows.

My machine was a dual-core, dual Xeon server with 12 GB RAM. I have 
access to a more powerful machine (quad-core, dual Xeon server with 32 
GB RAM), but I gave up on this strategy.

>
> So 1. and 2. sounds simple enough and the symbol can be the table 
> name. I think there are some advantages if I want to update that data 
> in the future.
>
> 3. sounds like I might run into performance problems especially if I 
> need to compare two symbols because this would require a self join.
>
> Typically we will want to be comparing one symbol against another and 
> synchronizing records based on dates to do correlation analysis. This 
> makes me think that separate tables is probably the best way to go.
>
> My hesitation is in the fact that I have never created a database with 
> 8000+ tables in postgresql and I'm not sure how that will work out in 
> an of itself.

As I noted above, I am not sure if 8000+ tables itself is a problem. My 
sense is that that should actually work fine. It is 8000+ *inherited* 
tables that are a problem. This sense was concurred with on Pg list by 
some of the Pg gurus.

>
> Anyway, sorry for the off topic post, but if anyone has any experience 
> with large number of tables I would appreciate you thoughts. Off list 
> would be fine to avoid additional OT noise here.
>
> Thanks,
>   -Steve
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users



More information about the postgis-users mailing list