Plans/Threading

Notes on threading yt

Obsolete - parallelization will be either with MPI tasks or with something other than this method. The GIL got in the way.

Parallelization plans will be kept elsewhere?!

I implemented a simple threading interface to computing the overlap of grids:

class OverlapThreading:
    def __init__(self, hierarchy, level, numthreads = 5):
        gridQ = Queue.Queue() 
        self.threads = []
        self.hierarchy = hierarchy
        self.gI = self.hierarchy.selectLevel(level)
        self.RE = hierarchy.gridRightEdge[self.gI]
        self.LE = hierarchy.gridLeftEdge[self.gI]
        for g in self.hierarchy.grids[self.gI]: gridQ.put(g)
        #print "Queue size:", gridQ.qsize()
        for i in range(min(numthreads, gridQ.qsize())):
            t=OverlapThreading.overlapper(self, gridQ)
            self.threads.append(t)
            t.start()
            #print "Starting %i" % i
        for t in self.threads: t.join()
                
    class overlapper(threading.Thread):
        numthreads = 0 
        def __init__(self, OT, gridQ):
            self.OT = OT
            self.gridQ = gridQ
            OverlapThreading.overlapper.numthreads += 1
            threading.Thread.__init__(self) 
            #print "STARTING THREAD"
        def run(self):
            j = 0
            OT = self.OT
            h = OT.hierarchy
            grids = OT.gI
            try:
                while 1:
                    grid = self.gridQ.get(False)
                    grid.generateOverlapMasks(0, OT.LE, OT.RE)
                    grid.myOverlapGrids[0] = h.grids[grids[na.where(grid.myOverlapMasks[0] == 1)]]
                    grid.generateOverlapMasks(1, OT.LE, OT.RE)
                    grid.myOverlapGrids[1] = h.grids[grids[na.where(grid.myOverlapMasks[1] == 1)]]
                    grid.generateOverlapMasks(2, OT.LE, OT.RE)
                    grid.myOverlapGrids[2] = h.grids[grids[na.where(grid.myOverlapMasks[2] == 1)]]
            except Queue.Empty:
                pass

However, even on John's enormous sim, this gives pretty poor results:

  • (1 thread) : 1.5547e+02 seconds taken
  • (4 threads) : 1.4555e+02
  • (8 threads) : 1.5862e+02
  • (32 threads) : 1.6953e+02 seconds taken

Changing it so that each thread has its own LE and RE arrays marginally improves.

Not sure where the slowdown comes from; the instantiation of the threads should not have so much overhead that it dominates. My best guess is that the NumPy? library doesn't release the GIL in the na.where and various other implied numpy calls.

Releasing the GIL inside the pyhdf code might be effective. Not sure this would translate well when moving to the Packed AMR format.