[mpich-discuss] Probelm regarding running the code

Rajeev Thakur thakur at mcs.anl.gov
Mon Feb 28 09:38:24 CST 2011


If you are using PBS, you may need to use the mpiexec script that is available here: http://www.osc.edu/~djohnson/mpiexec/index.php

On Feb 28, 2011, at 6:28 AM, Ashwinkumar Dobariya wrote:

> Hi Rajiv,
> 
> Thanks for reply.
> I first tried to load the using the wrapper command that is 
> 
>  mpif90 -o prog.f90  prog
> 
> then I submitted the script as below :
> 
> #/bin/bash
> #PBS -q compute
> #PBS -N test_job
> # Request 1 Node with 12 Processors
> #PBS -l nodes=1:ppn=12
> #PBS -l walltime=100:00:00
> #PBS -S /bin/bash
> #PBS -M your_email at lboro.ac.uk
> #PBS -m bae
> #PBS -A your_account12345
> #
> # Go to the directory from which you submitted the job
> cd $PBS_O_WORKDIR
> 
> module load intel_compilers
> module load bullxmpi
> 
> mpirun ./Multi_beta
> 
> but still the same error I am getting which is as below:
> 
> running mpdallexit on hydra127
> LAUNCHED mpd on hydra127  via
> RUNNING: mpd on hydra127
>  Total Nb of PE:            1
> 
>  PE#           0 /           1  OK
> PE# 0    0   0   0
> PE# 0    0  33   0 165   0  65
> PE# 0  -1  1 -1 -1 -1 -1
>  PE_Table, PE#           0  complete
> PE# 0   -0.03   0.98  -1.00   1.00  -0.03   1.97
>  PE#           0  doesn t intersect any bloc
>  PE#           0  will communicate with            0
>              single value
>  PE#           0  has           1  com. boundaries
>  Data_Read, PE#           0  complete
> 
>  PE#           0  checking boundary type for
>  0  1   1   1   0 165   0  65  nor sur sur sur gra  1  0  0
>  0  2  33  33   0 165   0  65            EXC ->  1
>  0  3   0  33   1   1   0  65  sur nor sur sur gra  0  1  0
>  0  4   0  33 164 164   0  65  sur nor sur sur gra  0 -1  0
>  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
>  0  6   0  33   0 165  64  64  cyc cyc cyc sur cyc  0  0 -1
>  PE#           0  Set new
>  PE#           0  FFT Table
>  PE#           0  Coeff
> Fatal error in MPI_Send: Invalid rank, error stack:
> MPI_Send(176): MPI_Send(buf=0x7fff9425c388, count=1, MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than 1
> rank 0 in job 1  hydra127_37620   caused collective abort of all ranks
>   exit status of rank 0: return code 1
> ~
> I am struggling to find the error but I am not sure where I mess up. if I ma runnign the other examples it is ok.
> 
> Thanks and Regards
> 
> On Fri, Feb 25, 2011 at 4:26 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> For some reason, each process thinks the total number of processes in the parallel job is 1. Check the wrapper script and try to run by hand using mpiexec. Also try running the cpi example from the examples directory and see if it runs correctly.
> 
> Rajeev
> 
> On Feb 25, 2011, at 9:43 AM, Ashwinkumar Dobariya wrote:
> 
> > Hello everyone,
> >
> > I am newbie here. I am running the code for Large eddy simulation of turbulent flow. I am compiling the code using wrapper command and running the code on Hydra cluster. when I am submitting the script file it is showing the following error.
> >
> > running mpdallexit on hydra127
> > LAUNCHED mpd on hydra127  via
> > RUNNING: mpd on hydra127
> > LAUNCHED mpd on hydra118  via  hydra127
> > RUNNING: mpd on hydra118
> > Fatal error in MPI_Send: Invalid rank, error stack:
> > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
> > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than 1
> >  Total Nb of PE:            1
> >
> >  PE#           0 /           1  OK
> > PE# 0    0   0   0
> > PE# 0    0  33   0 165   0  33
> > PE# 0  -1  1 -1 -1 -1  8
> >  PE_Table, PE#           0  complete
> > PE# 0   -0.03   0.98  -1.00   1.00  -0.03   0.98
> >  PE#           0  doesn t intersect any bloc
> >  PE#           0  will communicate with            0
> >              single value
> >  PE#           0  has           2  com. boundaries
> >  Data_Read, PE#           0  complete
> >
> >  PE#           0  checking boundary type for
> >  0  1   1   1   0 165   0  33  nor sur sur sur gra  1  0  0
> >  0  2  33  33   0 165   0  33            EXC ->  1
> >  0  3   0  33   1   1   0  33  sur nor sur sur gra  0  1  0
> >  0  4   0  33 164 164   0  33  sur nor sur sur gra  0 -1  0
> >  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
> >  0  6   0  33   0 165  33  33            EXC ->  8
> >  PE#           0  Set new
> >  PE#           0  FFT Table
> >  PE#           0  Coeff
> > rank 0 in job 1  hydra127_34565   caused collective abort of all ranks
> >   exit status of rank 0: return code 1
> >
> > I am struggling to find the error in my code. can anybody suggest me where I messed up.
> >
> > Thanks and Regards,
> > Ash _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list