[MPICH] batch job hangs

Wei-keng Liao wkliao at ece.northwestern.edu
Mon Jan 30 19:33:56 CST 2006


I sent the following question to "mpich2-maint at mcs.anl.gov" and was 
suggested by posting it here to see if someone can help.

Wei-keng

----------

I compiled an MPICH2-1.0.3 on Tungsten @ NCSA and used mpdboot and mpdallexit 
to wrap mpiexec in my batch script. I found that my job hangs after all 
commands finished in the script (including mpdallexit.)

But, when I used MPICH2-1.0.2p1, the batch job runs OK without hanging.
I wonder if the mpdallexit in 1.0.3 did not clean up all the processes.

I don't have any warning or error message, so don't have any useful info to 
provide. Please let me know how I can help debug. Below is a script I used.

Wei-keng

------------------------------------------------------------
#!/bin/csh
#BSUB -n 2
#BSUB -W 0:01
#BSUB -o out
#BSUB -J test
#BSUB -q normal
#BSUB -R "span[ptile=1]"   # use one proc only per node

setenv MPIDIR ${HOME}/MPICH
setenv MPIRUN ${MPIDIR}/bin/mpiexec

${MPIDIR}/bin/mpdboot --totalnum=${NPROCS} --file=${LSB_NODEFILE}

${MPIRUN} -n ${NPROCS} a.out

${MPIDIR}/bin/mpdallexit




More information about the mpich-discuss mailing list