[mpich-discuss] Probelm regarding running the code

Ashwinkumar Dobariya adobariya at gmail.com
Mon Feb 28 06:28:23 CST 2011


Hi Rajiv,

Thanks for reply.
I first tried to load the using the wrapper command that is

 mpif90 -o prog.f90  prog

then I submitted the script as below :

#/bin/bash
#PBS -q compute
#PBS -N test_job
# Request 1 Node with 12 Processors
#PBS -l nodes=1:ppn=12
#PBS -l walltime=100:00:00
#PBS -S /bin/bash
#PBS -M your_email at lboro.ac.uk
#PBS -m bae
#PBS -A your_account12345
#
# Go to the directory from which you submitted the job
cd $PBS_O_WORKDIR

module load intel_compilers
module load bullxmpi

mpirun ./Multi_beta

but still the same error I am getting which is as below:

running mpdallexit on hydra127
LAUNCHED mpd on hydra127  via
RUNNING: mpd on hydra127
 Total Nb of PE:            1

 PE#           0 /           1  OK
PE# 0    0   0   0
PE# 0    0  33   0 165   0  65
PE# 0  -1  1 -1 -1 -1 -1
 PE_Table, PE#           0  complete
PE# 0   -0.03   0.98  -1.00   1.00  -0.03   1.97
 PE#           0  doesn t intersect any bloc
 PE#           0  will communicate with            0
             single value
 PE#           0  has           1  com. boundaries
 Data_Read, PE#           0  complete

 PE#           0  checking boundary type for
 0  1   1   1   0 165   0  65  nor sur sur sur gra  1  0  0
 0  2  33  33   0 165   0  65            EXC ->  1
 0  3   0  33   1   1   0  65  sur nor sur sur gra  0  1  0
 0  4   0  33 164 164   0  65  sur nor sur sur gra  0 -1  0
 0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
 0  6   0  33   0 165  64  64  cyc cyc cyc sur cyc  0  0 -1
 PE#           0  Set new
 PE#           0  FFT Table
 PE#           0  Coeff
Fatal error in MPI_Send: Invalid rank, error stack:
MPI_Send(176): MPI_Send(buf=0x7fff9425c388, count=1, MPI_DOUBLE_PRECISION,
dest=1, tag=1, MPI_COMM_WORLD) failed
MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less
than 1
rank 0 in job 1  hydra127_37620   caused collective abort of all ranks
  exit status of rank 0: return code 1
~
I am struggling to find the error but I am not sure where I mess up. if I ma
runnign the other examples it is ok.

Thanks and Regards

On Fri, Feb 25, 2011 at 4:26 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:

> For some reason, each process thinks the total number of processes in the
> parallel job is 1. Check the wrapper script and try to run by hand using
> mpiexec. Also try running the cpi example from the examples directory and
> see if it runs correctly.
>
> Rajeev
>
> On Feb 25, 2011, at 9:43 AM, Ashwinkumar Dobariya wrote:
>
> > Hello everyone,
> >
> > I am newbie here. I am running the code for Large eddy simulation of
> turbulent flow. I am compiling the code using wrapper command and running
> the code on Hydra cluster. when I am submitting the script file it is
> showing the following error.
> >
> > running mpdallexit on hydra127
> > LAUNCHED mpd on hydra127  via
> > RUNNING: mpd on hydra127
> > LAUNCHED mpd on hydra118  via  hydra127
> > RUNNING: mpd on hydra118
> > Fatal error in MPI_Send: Invalid rank, error stack:
> > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1,
> MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
> > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less
> than 1
> >  Total Nb of PE:            1
> >
> >  PE#           0 /           1  OK
> > PE# 0    0   0   0
> > PE# 0    0  33   0 165   0  33
> > PE# 0  -1  1 -1 -1 -1  8
> >  PE_Table, PE#           0  complete
> > PE# 0   -0.03   0.98  -1.00   1.00  -0.03   0.98
> >  PE#           0  doesn t intersect any bloc
> >  PE#           0  will communicate with            0
> >              single value
> >  PE#           0  has           2  com. boundaries
> >  Data_Read, PE#           0  complete
> >
> >  PE#           0  checking boundary type for
> >  0  1   1   1   0 165   0  33  nor sur sur sur gra  1  0  0
> >  0  2  33  33   0 165   0  33            EXC ->  1
> >  0  3   0  33   1   1   0  33  sur nor sur sur gra  0  1  0
> >  0  4   0  33 164 164   0  33  sur nor sur sur gra  0 -1  0
> >  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
> >  0  6   0  33   0 165  33  33            EXC ->  8
> >  PE#           0  Set new
> >  PE#           0  FFT Table
> >  PE#           0  Coeff
> > rank 0 in job 1  hydra127_34565   caused collective abort of all ranks
> >   exit status of rank 0: return code 1
> >
> > I am struggling to find the error in my code. can anybody suggest me
> where I messed up.
> >
> > Thanks and Regards,
> > Ash _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110228/5bd75326/attachment.htm>


More information about the mpich-discuss mailing list