[mpich-discuss] MPI communication problem.(MPI Abort by user Aborting program!)
wangxinquan
wangxinquan at tju.edu.cn
Fri Apr 11 23:00:18 CDT 2008
Dear Martin,
According to your suggestion, the job work well:)
Thank you very much for your help!
Cheers, XQ Wang
=====================================
X.Q. Wang
wangxinquan at tju.edu.cn
Schoolof Chemical Engineeringand Technology
TianjinUniversity
92 Weijin Road, Tianjin, P. R. China
tel:86-22-27890268, fax: 86-22-27892301
=====================================
发件人: Martin Siegert
发送时间: 2008-04-11 08:34:33
收件人: mpich-discuss at mcs.anl.gov
抄送:
主题: Re: [mpich-discuss] MPI communication problem.(MPI Abort by user Aborting program!)
Hi,
try
mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x \
-in /nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out
see the espresso user-guide pages 29 and 30. In most of the espresso
examples it is assumed that stdin is connected on all nodes,
which (as far as I know) is not guaranteed by the MPI standard.
However, the "-in inputfile" option should always work.
Cheers,
Martin
--
Martin Siegert
Head, Research Computing
WestGrid Site Lead
Client and Research Services phone: 778 782-4691
Simon Fraser University fax: 778 782-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
On Fri, Apr 11, 2008 at 07:56:32AM +0800, wangxinquan at tju.edu.cn wrote:
> Dear all,
>
> Recently I have done a test on Nankai Stars HPC. The error message
> "MPI Abort by user Aborting program!Aborting program!"appeared when I did
> a calculation through 2 cpu over 2 nodes. But it worked well over 1 nodes.
> So, I'm afraid it is a MPI communication problem.
> After google, I have found some hints.My communication library (MPI)
> might not be properly configured to allow input redirection (so that I
> am effectively reading an empty file).
> The pwscf package need to be configured to allow interactive execution.
> Do I need to adjust some parameters of MPICH?
>
> Any help will be deeply appreciated!
>
> Calculation Details are as follows:
> ---------------------------------------------------------------------------------
> HPC background:
> Nankai Stars (http://202.113.29.200/introduce.htm)
> 800 Xeon 3.06 Ghz CPU (400 nodes)
> 800 GB Memory
> 53T High-Speed Storage
> Myrinet
> Parallel jobs are run and debuged through Platform LSF system.
> Mpich_gm driver:1.2.6..13a
> Test package: Espresso-3.2.3(www.pwscf.org)
> ---------------------------------------------------------------------------------
>
> ---------------------------------------------------------------------------------
> Installation:
> /configure CC=mpicc F77=mpif77 F90=mpif90
> modified make.sys file:
> IFLAGS=-I. -I/usr/local/mpich/1.2.6..13a/gm-2.1.3aa2nks3/smp/intel32/ssh/include
> MPI_LIBS=/usr/local/mpich/1.2.6..13a/gm-2.1.3aa2nks3/smp/intel32/ssh/lib/libmpichf
> 90.a
> make all
> ---------------------------------------------------------------------------------
>
> ---------------------------------------------------------------------------------
> Submit script :
> #!/bin/bash
> #BSUB -q normal
> #BSUB -J test.icymoon
> #BSUB -c 3:00
> #BSUB -a "mpich_gm"
> #BSUB -o %J.log
> #BSUB -n 2
>
> cd /nfs/s04r2p1/wangxq_tj
> echo "test icymoon"
>
> mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
> /nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out
>
> echo "test icymoon end"
> ---------------------------------------------------------------------------------
>
> ---------------------------------------------------------------------------------
> Output file (%J.log):
>
> ?? ??
> The output (if any) follows:
>
> test icymoon
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> test icymoon end
> ---------------------------------------------------------------------------------
>
>
> Best regards,XQ Wang
>
> =====================================
>
> X.Q. Wang
>
> wangxinquan at tju.edu.cn
>
> School of Chemical Engineering and Technology
>
> Tianjin University
>
> 92 Weijin Road, Tianjin, P. R. China
>
> tel:86-22-27890268, fax: 86-22-27892301
>
> =====================================
More information about the mpich-discuss
mailing list