[mpich-discuss] MPI communication problem.(MPI Abort by user Aborting program!)

wangxinquan at tju.edu.cn wangxinquan at tju.edu.cn
Thu Apr 10 18:56:32 CDT 2008


Dear all,

     Recently I have done a test on Nankai Stars HPC. The error message 
"MPI Abort by user Aborting program!Aborting program!"appeared when I did 
a calculation through 2 cpu over 2 nodes. But it worked well over 1 nodes.
So, I'm afraid it is a MPI communication problem.
     After google, I have found some hints.My communication library (MPI)
might not be properly configured to allow input redirection (so that I
am effectively reading an empty file).
     The pwscf package need to be configured to allow interactive execution.
Do I need to adjust some parameters of MPICH?

     Any help will be deeply appreciated!

Calculation Details are as follows:
---------------------------------------------------------------------------------
HPC background:
Nankai Stars (http://202.113.29.200/introduce.htm)
800 Xeon 3.06 Ghz CPU (400 nodes)   
800 GB Memory    
53T High-Speed Storage    
Myrinet
Parallel jobs are run and debuged through Platform LSF system.
Mpich_gm driver:1.2.6..13a
Test package: Espresso-3.2.3(www.pwscf.org)
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Installation:
/configure CC=mpicc F77=mpif77 F90=mpif90
modified make.sys file:
IFLAGS=-I. -I/usr/local/mpich/1.2.6..13a/gm-2.1.3aa2nks3/smp/intel32/ssh/include
MPI_LIBS=/usr/local/mpich/1.2.6..13a/gm-2.1.3aa2nks3/smp/intel32/ssh/lib/libmpichf
90.a
make all
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Submit script :
#!/bin/bash
#BSUB -q normal
#BSUB -J test.icymoon
#BSUB -c 3:00
#BSUB -a "mpich_gm"
#BSUB -o %J.log
#BSUB -n 2 

cd /nfs/s04r2p1/wangxq_tj
echo "test icymoon"

mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
/nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out

echo "test icymoon end"
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Output file (%J.log):

¡­ ¡­
The output (if any) follows:

test icymoon
[0]  MPI Abort by user Aborting program !
[0] Aborting program!
test icymoon end
---------------------------------------------------------------------------------


Best regards,XQ Wang

=====================================

X.Q. Wang 

wangxinquan at tju.edu.cn

School of Chemical Engineering and Technology

Tianjin University

92 Weijin Road, Tianjin, P. R. China

tel:86-22-27890268, fax: 86-22-27892301

===================================== 





More information about the mpich-discuss mailing list