Notes on threading yt
Obsolete - parallelization will be either with MPI tasks or with something other than this method. The GIL got in the way.
Parallelization plans will be kept elsewhere?!
I implemented a simple threading interface to computing the overlap of grids:
class OverlapThreading:
def __init__(self, hierarchy, level, numthreads = 5):
gridQ = Queue.Queue()
self.threads = []
self.hierarchy = hierarchy
self.gI = self.hierarchy.selectLevel(level)
self.RE = hierarchy.gridRightEdge[self.gI]
self.LE = hierarchy.gridLeftEdge[self.gI]
for g in self.hierarchy.grids[self.gI]: gridQ.put(g)
#print "Queue size:", gridQ.qsize()
for i in range(min(numthreads, gridQ.qsize())):
t=OverlapThreading.overlapper(self, gridQ)
self.threads.append(t)
t.start()
#print "Starting %i" % i
for t in self.threads: t.join()
class overlapper(threading.Thread):
numthreads = 0
def __init__(self, OT, gridQ):
self.OT = OT
self.gridQ = gridQ
OverlapThreading.overlapper.numthreads += 1
threading.Thread.__init__(self)
#print "STARTING THREAD"
def run(self):
j = 0
OT = self.OT
h = OT.hierarchy
grids = OT.gI
try:
while 1:
grid = self.gridQ.get(False)
grid.generateOverlapMasks(0, OT.LE, OT.RE)
grid.myOverlapGrids[0] = h.grids[grids[na.where(grid.myOverlapMasks[0] == 1)]]
grid.generateOverlapMasks(1, OT.LE, OT.RE)
grid.myOverlapGrids[1] = h.grids[grids[na.where(grid.myOverlapMasks[1] == 1)]]
grid.generateOverlapMasks(2, OT.LE, OT.RE)
grid.myOverlapGrids[2] = h.grids[grids[na.where(grid.myOverlapMasks[2] == 1)]]
except Queue.Empty:
pass
However, even on John's enormous sim, this gives pretty poor results:
- (1 thread) : 1.5547e+02 seconds taken
- (4 threads) : 1.4555e+02
- (8 threads) : 1.5862e+02
- (32 threads) : 1.6953e+02 seconds taken
Changing it so that each thread has its own LE and RE arrays marginally improves.
Not sure where the slowdown comes from; the instantiation of the threads should not have so much overhead that it dominates. My best guess is that the NumPy? library doesn't release the GIL in the na.where and various other implied numpy calls.
Releasing the GIL inside the pyhdf code might be effective. Not sure this would translate well when moving to the Packed AMR format.
