Adding (from bitbucket experimental repo) the rpdb command. If you run a
simulation with --rpdb, uncaught exceptions will spawn a (localhost-bound) pdb
instance that responds via XML-RPC. This enables you to pdb individual MPI
tasks, if you are running on the node that the tasks are on.
For instance,
mpirun -np 4 python2.6 some_script_with_a_bug.py --parallel --rpdb
will spawn four XML-RPC servers with embedded pdb on an uncaught exception,
each of which can be communicated with via the command
yt rpdb [0123]
where the task you want to communicate with is the argument. When the command
'shutdown' has been issued to a rpdb server, it will shut down and barrier,
waiting for all the other tasks to barrier to complete the task. So you need
to 'shutdown' all of them before the tasks all die.
I've used this to debug some parallel bugs lately; very handy... Note that it
binds only to localhost, but this is STILL a security flaw if you have other
users on the system. SO -- BEWARE.