[mpich-discuss] Probelm regarding running the code

Rajeev Thakur thakur at mcs.anl.gov
Fri Feb 25 10:26:59 CST 2011


For some reason, each process thinks the total number of processes in the parallel job is 1. Check the wrapper script and try to run by hand using mpiexec. Also try running the cpi example from the examples directory and see if it runs correctly.

Rajeev

On Feb 25, 2011, at 9:43 AM, Ashwinkumar Dobariya wrote:

> Hello everyone,
> 
> I am newbie here. I am running the code for Large eddy simulation of turbulent flow. I am compiling the code using wrapper command and running the code on Hydra cluster. when I am submitting the script file it is showing the following error.
>  
> running mpdallexit on hydra127
> LAUNCHED mpd on hydra127  via
> RUNNING: mpd on hydra127
> LAUNCHED mpd on hydra118  via  hydra127
> RUNNING: mpd on hydra118
> Fatal error in MPI_Send: Invalid rank, error stack:
> MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than 1
>  Total Nb of PE:            1
> 
>  PE#           0 /           1  OK
> PE# 0    0   0   0
> PE# 0    0  33   0 165   0  33
> PE# 0  -1  1 -1 -1 -1  8
>  PE_Table, PE#           0  complete
> PE# 0   -0.03   0.98  -1.00   1.00  -0.03   0.98
>  PE#           0  doesn t intersect any bloc
>  PE#           0  will communicate with            0
>              single value
>  PE#           0  has           2  com. boundaries
>  Data_Read, PE#           0  complete
> 
>  PE#           0  checking boundary type for
>  0  1   1   1   0 165   0  33  nor sur sur sur gra  1  0  0
>  0  2  33  33   0 165   0  33            EXC ->  1
>  0  3   0  33   1   1   0  33  sur nor sur sur gra  0  1  0
>  0  4   0  33 164 164   0  33  sur nor sur sur gra  0 -1  0
>  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
>  0  6   0  33   0 165  33  33            EXC ->  8
>  PE#           0  Set new
>  PE#           0  FFT Table
>  PE#           0  Coeff
> rank 0 in job 1  hydra127_34565   caused collective abort of all ranks
>   exit status of rank 0: return code 1
> 
> I am struggling to find the error in my code. can anybody suggest me where I messed up.
> 
> Thanks and Regards,
> Ash _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list