postgresql - Parallel Aggregation with postgres insert/copy locking and blocking -


i need big amount of calculation, insert result postgres database. used parallel aggregation calculation , shard processor count.

i noticed each cycle, first 5 mins, cpu utilization 100% cores doing calculation. next 30 mins, data copied postgres different threads, blocks each other... wondering should optimize it. sharding not option here, partition table date, don't want make complicated sharding again personid...

is there way resolve issue? possible put data single queue , let postgres process queue?

more details: in each thread, 2 things: 1) calls data service calculation, , result set 2) copy result set database table(kpis_weekly)

we start n (n = processor count) threads above process, data sharded personid spread evenly on threads. bottle neck database table. these threads have insert same table, therefore block each other , slow down whole thing.

the issue (as op rightly pointed out) funneling writes thread generate 1 single table bottleneck.

what here despite using partition (which assists in reads / select statements) writes not trigger oriented. yes, 'does' mean moving logic of generating table names threads, us, 'really' worth effort.

so e.g. following code writes directly child tables (instead of parent tables) importantly writes parallelized. writes i/o bound, faster disks, faster writes. following sql using parallel threads (even writes) , not db bottleneck (but disk-subsystem one).

sql

create unlogged table p(a integer); create unlogged table c1() inherits (p); create unlogged table c2() inherits (p); create unlogged table c3() inherits (p); create unlogged table c4() inherits (p); create unlogged table c5() inherits (p); create unlogged table c6() inherits (p); create unlogged table c7() inherits (p); create unlogged table c8() inherits (p); 

via shell script

psql -c "insert c1 select generate_series (1,10000000) e(a);" & psql -c "insert c2 select generate_series (1,10000000) e(a);" & psql -c "insert c3 select generate_series (1,10000000) e(a);" & psql -c "insert c4 select generate_series (1,10000000) e(a);" & psql -c "insert c5 select generate_series (1,10000000) e(a);" & psql -c "insert c6 select generate_series (1,10000000) e(a);" & psql -c "insert c7 select generate_series (1,10000000) e(a);" & psql -c "insert c8 select generate_series (1,10000000) e(a);" & 

after insert, may want convert them logged tables (postgresql 9.5+) this, (child+parent) tables:

alter table p set logged;


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -