multithreading - Python itertools with multiprocessing - huge list vs inefficient CPUs usage with iterator -


i work on n elements (named "pair" below) variations repetition used function's argument. works fine long "r" list not big enough consume memory. issue have make more 16 repetitions 6 elements eventually. use 40 cores system in cloud this.

the code looks looks following:

if __name__ == '__main__':   pool = pool(39)   r = itertools.product(pairs,repeat=16)   pool.map(f, r) 

i believe should use iterator instead of creating huge list upfront , here problem starts..

i tried solve issue following code:

if __name__ == '__main__':   pool = pool(39)   r in itertools.product(pairs,repeat=14):     pool.map(f, r) 

the memory problem goes away cpus usage 5% per core. single core version of code faster this.

i'd appreciate if guide me bit..

thanks.

your original code isn't creating list upfront in own code (itertools.product returns generator), pool.map realizing whole generator (because assumes if can store outputs, can store inputs too).

don't use pool.map here. if need ordered results, using pool.imap, or if result order unimportant, use pool.imap_unordered. iterate result of either call (don't wrap in list), , process results come, , memory should not issue:

if __name__ == '__main__':     pool = pool(39)     result in pool.imap(f, itertools.product(pairs, repeat=16)):         print(result) 

if you're using pool.map side-effects, need run completion results , ordering don't matter, dramatically improve performance using imap_unordered , using collections.deque efficiently drain "results" without storing (a deque maxlen of 0 fastest, lowest memory way force iterator run completion without storing results):

from collections import deque  if __name__ == '__main__':     pool = pool(39)     deque(pool.imap_unordered(f, itertools.product(pairs, repeat=16)), 0) 

lastly, i'm little suspicious of specifying 39 pool workers; multiprocessing largely beneficial cpu bound tasks; if you're using using more workers have cpu cores , gaining benefit, it's possible multiprocessing costing more in ipc gains, , using more workers masking problem buffering more data.

if work largely i/o bound, might try using thread based pool, avoid overhead of pickling , unpickling, cost of ipc between parent , child processes. unlike process based pools, python threading subject gil issues, cpu bound work in python (excluding gil releasing calls i/o, ctypes calls .dll/.so files, , third party extensions numpy release gil heavy cpu work) limited single core (and in python 2.x cpu bound work waste decent amount of resolving gil contention , performing context switches; python 3 removes of waste). if work largely i/o bound, blocking on i/o releases gil allow other threads run, can have many threads long of them delay on i/o. it's easy switch (as long haven't designed program rely on separate address spaces each worker assuming can write "shared" state , not affect other workers or parent process), change:

from multiprocessing import pool 

to:

from multiprocessing.dummy import pool 

and multiprocessing.dummy version of pool, based on threads instead of processes.


Comments