[mpich-discuss] Probelm regarding running the code

Rajeev Thakur thakur at mcs.anl.gov
Mon Feb 28 09:43:42 CST 2011


And with the new version, you won't need to download the mpiexec I mentioned.

Rajeev

On Feb 28, 2011, at 9:42 AM, Pavan Balaji wrote:

> 
> It looks like you are using an older version of MPICH2, which uses MPD internally. Can you upgrade your version of MPICH2? Starting mpich2-1.3, we use Hydra which integrates better with PBS environments.
> 
> -- Pavan
> 
> On 02/28/2011 06:28 AM, Ashwinkumar Dobariya wrote:
>> Hi Rajiv,
>> 
>> Thanks for reply.
>> I first tried to load the using the wrapper command that is
>> 
>>  mpif90 -o prog.f90  prog
>> 
>> then I submitted the script as below :
>> 
>> #/bin/bash
>> #PBS -q compute
>> #PBS -N test_job
>> # Request 1 Node with 12 Processors
>> #PBS -l nodes=1:ppn=12
>> #PBS -l walltime=100:00:00
>> #PBS -S /bin/bash
>> #PBS -M your_email at lboro.ac.uk <mailto:your_email at lboro.ac.uk>
>> #PBS -m bae
>> #PBS -A your_account12345
>> #
>> # Go to the directory from which you submitted the job
>> cd $PBS_O_WORKDIR
>> 
>> module load intel_compilers
>> module load bullxmpi
>> 
>> mpirun ./Multi_beta
>> 
>> but still the same error I am getting which is as below:
>> 
>> running mpdallexit on hydra127
>> LAUNCHED mpd on hydra127  via
>> RUNNING: mpd on hydra127
>>  Total Nb of PE:            1
>> 
>>  PE#           0 /           1  OK
>> PE# 0    0   0   0
>> PE# 0    0  33   0 165   0  65
>> PE# 0  -1  1 -1 -1 -1 -1
>>  PE_Table, PE#           0  complete
>> PE# 0   -0.03   0.98  -1.00   1.00  -0.03   1.97
>>  PE#           0  doesn t intersect any bloc
>>  PE#           0  will communicate with            0
>>              single value
>>  PE#           0  has           1  com. boundaries
>>  Data_Read, PE#           0  complete
>> 
>>  PE#           0  checking boundary type for
>>  0  1   1   1   0 165   0  65  nor sur sur sur gra  1  0  0
>>  0  2  33  33   0 165   0  65            EXC ->  1
>>  0  3   0  33   1   1   0  65  sur nor sur sur gra  0  1  0
>>  0  4   0  33 164 164   0  65  sur nor sur sur gra  0 -1  0
>>  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
>>  0  6   0  33   0 165  64  64  cyc cyc cyc sur cyc  0  0 -1
>>  PE#           0  Set new
>>  PE#           0  FFT Table
>>  PE#           0  Coeff
>> Fatal error in MPI_Send: Invalid rank, error stack:
>> MPI_Send(176): MPI_Send(buf=0x7fff9425c388, count=1,
>> MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
>> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less
>> than 1
>> rank 0 in job 1  hydra127_37620   caused collective abort of all ranks
>>   exit status of rank 0: return code 1
>> ~
>> I am struggling to find the error but I am not sure where I mess up. if
>> I ma runnign the other examples it is ok.
>> 
>> Thanks and Regards
>> 
>> On Fri, Feb 25, 2011 at 4:26 PM, Rajeev Thakur <thakur at mcs.anl.gov
>> <mailto:thakur at mcs.anl.gov>> wrote:
>> 
>>    For some reason, each process thinks the total number of processes
>>    in the parallel job is 1. Check the wrapper script and try to run by
>>    hand using mpiexec. Also try running the cpi example from the
>>    examples directory and see if it runs correctly.
>> 
>>    Rajeev
>> 
>>    On Feb 25, 2011, at 9:43 AM, Ashwinkumar Dobariya wrote:
>> 
>>     > Hello everyone,
>>     >
>>     > I am newbie here. I am running the code for Large eddy simulation
>>    of turbulent flow. I am compiling the code using wrapper command and
>>    running the code on Hydra cluster. when I am submitting the script
>>    file it is showing the following error.
>>     >
>>     > running mpdallexit on hydra127
>>     > LAUNCHED mpd on hydra127  via
>>     > RUNNING: mpd on hydra127
>>     > LAUNCHED mpd on hydra118  via  hydra127
>>     > RUNNING: mpd on hydra118
>>     > Fatal error in MPI_Send: Invalid rank, error stack:
>>     > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1,
>>    MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
>>     > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative
>>    and less than 1
>>     >  Total Nb of PE:            1
>>     >
>>     >  PE#           0 /           1  OK
>>     > PE# 0    0   0   0
>>     > PE# 0    0  33   0 165   0  33
>>     > PE# 0  -1  1 -1 -1 -1  8
>>     >  PE_Table, PE#           0  complete
>>     > PE# 0   -0.03   0.98  -1.00   1.00  -0.03   0.98
>>     >  PE#           0  doesn t intersect any bloc
>>     >  PE#           0  will communicate with            0
>>     >              single value
>>     >  PE#           0  has           2  com. boundaries
>>     >  Data_Read, PE#           0  complete
>>     >
>>     >  PE#           0  checking boundary type for
>>     >  0  1   1   1   0 165   0  33  nor sur sur sur gra  1  0  0
>>     >  0  2  33  33   0 165   0  33            EXC ->  1
>>     >  0  3   0  33   1   1   0  33  sur nor sur sur gra  0  1  0
>>     >  0  4   0  33 164 164   0  33  sur nor sur sur gra  0 -1  0
>>     >  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
>>     >  0  6   0  33   0 165  33  33            EXC ->  8
>>     >  PE#           0  Set new
>>     >  PE#           0  FFT Table
>>     >  PE#           0  Coeff
>>     > rank 0 in job 1  hydra127_34565   caused collective abort of all
>>    ranks
>>     >   exit status of rank 0: return code 1
>>     >
>>     > I am struggling to find the error in my code. can anybody suggest
>>    me where I messed up.
>>     >
>>     > Thanks and Regards,
>>     > Ash _______________________________________________
>>     > mpich-discuss mailing list
>>     > mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>>     > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>>    _______________________________________________
>>    mpich-discuss mailing list
>>    mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>>    https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> 
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list