[MPICH] batch job hangs
Wei-keng Liao
wkliao at ece.northwestern.edu
Mon Jan 30 19:33:56 CST 2006
I sent the following question to "mpich2-maint at mcs.anl.gov" and was
suggested by posting it here to see if someone can help.
Wei-keng
----------
I compiled an MPICH2-1.0.3 on Tungsten @ NCSA and used mpdboot and mpdallexit
to wrap mpiexec in my batch script. I found that my job hangs after all
commands finished in the script (including mpdallexit.)
But, when I used MPICH2-1.0.2p1, the batch job runs OK without hanging.
I wonder if the mpdallexit in 1.0.3 did not clean up all the processes.
I don't have any warning or error message, so don't have any useful info to
provide. Please let me know how I can help debug. Below is a script I used.
Wei-keng
------------------------------------------------------------
#!/bin/csh
#BSUB -n 2
#BSUB -W 0:01
#BSUB -o out
#BSUB -J test
#BSUB -q normal
#BSUB -R "span[ptile=1]" # use one proc only per node
setenv MPIDIR ${HOME}/MPICH
setenv MPIRUN ${MPIDIR}/bin/mpiexec
${MPIDIR}/bin/mpdboot --totalnum=${NPROCS} --file=${LSB_NODEFILE}
${MPIRUN} -n ${NPROCS} a.out
${MPIDIR}/bin/mpdallexit
More information about the mpich-discuss
mailing list