[mpich-discuss] Probelm regarding running the code
Pavan Balaji
balaji at mcs.anl.gov
Mon Feb 28 09:42:32 CST 2011
It looks like you are using an older version of MPICH2, which uses MPD
internally. Can you upgrade your version of MPICH2? Starting mpich2-1.3,
we use Hydra which integrates better with PBS environments.
-- Pavan
On 02/28/2011 06:28 AM, Ashwinkumar Dobariya wrote:
> Hi Rajiv,
>
> Thanks for reply.
> I first tried to load the using the wrapper command that is
>
> mpif90 -o prog.f90 prog
>
> then I submitted the script as below :
>
> #/bin/bash
> #PBS -q compute
> #PBS -N test_job
> # Request 1 Node with 12 Processors
> #PBS -l nodes=1:ppn=12
> #PBS -l walltime=100:00:00
> #PBS -S /bin/bash
> #PBS -M your_email at lboro.ac.uk <mailto:your_email at lboro.ac.uk>
> #PBS -m bae
> #PBS -A your_account12345
> #
> # Go to the directory from which you submitted the job
> cd $PBS_O_WORKDIR
>
> module load intel_compilers
> module load bullxmpi
>
> mpirun ./Multi_beta
>
> but still the same error I am getting which is as below:
>
> running mpdallexit on hydra127
> LAUNCHED mpd on hydra127 via
> RUNNING: mpd on hydra127
> Total Nb of PE: 1
>
> PE# 0 / 1 OK
> PE# 0 0 0 0
> PE# 0 0 33 0 165 0 65
> PE# 0 -1 1 -1 -1 -1 -1
> PE_Table, PE# 0 complete
> PE# 0 -0.03 0.98 -1.00 1.00 -0.03 1.97
> PE# 0 doesn t intersect any bloc
> PE# 0 will communicate with 0
> single value
> PE# 0 has 1 com. boundaries
> Data_Read, PE# 0 complete
>
> PE# 0 checking boundary type for
> 0 1 1 1 0 165 0 65 nor sur sur sur gra 1 0 0
> 0 2 33 33 0 165 0 65 EXC -> 1
> 0 3 0 33 1 1 0 65 sur nor sur sur gra 0 1 0
> 0 4 0 33 164 164 0 65 sur nor sur sur gra 0 -1 0
> 0 5 0 33 0 165 1 1 cyc cyc cyc sur cyc 0 0 1
> 0 6 0 33 0 165 64 64 cyc cyc cyc sur cyc 0 0 -1
> PE# 0 Set new
> PE# 0 FFT Table
> PE# 0 Coeff
> Fatal error in MPI_Send: Invalid rank, error stack:
> MPI_Send(176): MPI_Send(buf=0x7fff9425c388, count=1,
> MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less
> than 1
> rank 0 in job 1 hydra127_37620 caused collective abort of all ranks
> exit status of rank 0: return code 1
> ~
> I am struggling to find the error but I am not sure where I mess up. if
> I ma runnign the other examples it is ok.
>
> Thanks and Regards
>
> On Fri, Feb 25, 2011 at 4:26 PM, Rajeev Thakur <thakur at mcs.anl.gov
> <mailto:thakur at mcs.anl.gov>> wrote:
>
> For some reason, each process thinks the total number of processes
> in the parallel job is 1. Check the wrapper script and try to run by
> hand using mpiexec. Also try running the cpi example from the
> examples directory and see if it runs correctly.
>
> Rajeev
>
> On Feb 25, 2011, at 9:43 AM, Ashwinkumar Dobariya wrote:
>
> > Hello everyone,
> >
> > I am newbie here. I am running the code for Large eddy simulation
> of turbulent flow. I am compiling the code using wrapper command and
> running the code on Hydra cluster. when I am submitting the script
> file it is showing the following error.
> >
> > running mpdallexit on hydra127
> > LAUNCHED mpd on hydra127 via
> > RUNNING: mpd on hydra127
> > LAUNCHED mpd on hydra118 via hydra127
> > RUNNING: mpd on hydra118
> > Fatal error in MPI_Send: Invalid rank, error stack:
> > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1,
> MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
> > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative
> and less than 1
> > Total Nb of PE: 1
> >
> > PE# 0 / 1 OK
> > PE# 0 0 0 0
> > PE# 0 0 33 0 165 0 33
> > PE# 0 -1 1 -1 -1 -1 8
> > PE_Table, PE# 0 complete
> > PE# 0 -0.03 0.98 -1.00 1.00 -0.03 0.98
> > PE# 0 doesn t intersect any bloc
> > PE# 0 will communicate with 0
> > single value
> > PE# 0 has 2 com. boundaries
> > Data_Read, PE# 0 complete
> >
> > PE# 0 checking boundary type for
> > 0 1 1 1 0 165 0 33 nor sur sur sur gra 1 0 0
> > 0 2 33 33 0 165 0 33 EXC -> 1
> > 0 3 0 33 1 1 0 33 sur nor sur sur gra 0 1 0
> > 0 4 0 33 164 164 0 33 sur nor sur sur gra 0 -1 0
> > 0 5 0 33 0 165 1 1 cyc cyc cyc sur cyc 0 0 1
> > 0 6 0 33 0 165 33 33 EXC -> 8
> > PE# 0 Set new
> > PE# 0 FFT Table
> > PE# 0 Coeff
> > rank 0 in job 1 hydra127_34565 caused collective abort of all
> ranks
> > exit status of rank 0: return code 1
> >
> > I am struggling to find the error in my code. can anybody suggest
> me where I messed up.
> >
> > Thanks and Regards,
> > Ash _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list