[mpich-discuss] Probelm regarding running the code
Rajeev Thakur
thakur at mcs.anl.gov
Mon Feb 28 09:43:42 CST 2011
And with the new version, you won't need to download the mpiexec I mentioned.
Rajeev
On Feb 28, 2011, at 9:42 AM, Pavan Balaji wrote:
>
> It looks like you are using an older version of MPICH2, which uses MPD internally. Can you upgrade your version of MPICH2? Starting mpich2-1.3, we use Hydra which integrates better with PBS environments.
>
> -- Pavan
>
> On 02/28/2011 06:28 AM, Ashwinkumar Dobariya wrote:
>> Hi Rajiv,
>>
>> Thanks for reply.
>> I first tried to load the using the wrapper command that is
>>
>> mpif90 -o prog.f90 prog
>>
>> then I submitted the script as below :
>>
>> #/bin/bash
>> #PBS -q compute
>> #PBS -N test_job
>> # Request 1 Node with 12 Processors
>> #PBS -l nodes=1:ppn=12
>> #PBS -l walltime=100:00:00
>> #PBS -S /bin/bash
>> #PBS -M your_email at lboro.ac.uk <mailto:your_email at lboro.ac.uk>
>> #PBS -m bae
>> #PBS -A your_account12345
>> #
>> # Go to the directory from which you submitted the job
>> cd $PBS_O_WORKDIR
>>
>> module load intel_compilers
>> module load bullxmpi
>>
>> mpirun ./Multi_beta
>>
>> but still the same error I am getting which is as below:
>>
>> running mpdallexit on hydra127
>> LAUNCHED mpd on hydra127 via
>> RUNNING: mpd on hydra127
>> Total Nb of PE: 1
>>
>> PE# 0 / 1 OK
>> PE# 0 0 0 0
>> PE# 0 0 33 0 165 0 65
>> PE# 0 -1 1 -1 -1 -1 -1
>> PE_Table, PE# 0 complete
>> PE# 0 -0.03 0.98 -1.00 1.00 -0.03 1.97
>> PE# 0 doesn t intersect any bloc
>> PE# 0 will communicate with 0
>> single value
>> PE# 0 has 1 com. boundaries
>> Data_Read, PE# 0 complete
>>
>> PE# 0 checking boundary type for
>> 0 1 1 1 0 165 0 65 nor sur sur sur gra 1 0 0
>> 0 2 33 33 0 165 0 65 EXC -> 1
>> 0 3 0 33 1 1 0 65 sur nor sur sur gra 0 1 0
>> 0 4 0 33 164 164 0 65 sur nor sur sur gra 0 -1 0
>> 0 5 0 33 0 165 1 1 cyc cyc cyc sur cyc 0 0 1
>> 0 6 0 33 0 165 64 64 cyc cyc cyc sur cyc 0 0 -1
>> PE# 0 Set new
>> PE# 0 FFT Table
>> PE# 0 Coeff
>> Fatal error in MPI_Send: Invalid rank, error stack:
>> MPI_Send(176): MPI_Send(buf=0x7fff9425c388, count=1,
>> MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
>> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less
>> than 1
>> rank 0 in job 1 hydra127_37620 caused collective abort of all ranks
>> exit status of rank 0: return code 1
>> ~
>> I am struggling to find the error but I am not sure where I mess up. if
>> I ma runnign the other examples it is ok.
>>
>> Thanks and Regards
>>
>> On Fri, Feb 25, 2011 at 4:26 PM, Rajeev Thakur <thakur at mcs.anl.gov
>> <mailto:thakur at mcs.anl.gov>> wrote:
>>
>> For some reason, each process thinks the total number of processes
>> in the parallel job is 1. Check the wrapper script and try to run by
>> hand using mpiexec. Also try running the cpi example from the
>> examples directory and see if it runs correctly.
>>
>> Rajeev
>>
>> On Feb 25, 2011, at 9:43 AM, Ashwinkumar Dobariya wrote:
>>
>> > Hello everyone,
>> >
>> > I am newbie here. I am running the code for Large eddy simulation
>> of turbulent flow. I am compiling the code using wrapper command and
>> running the code on Hydra cluster. when I am submitting the script
>> file it is showing the following error.
>> >
>> > running mpdallexit on hydra127
>> > LAUNCHED mpd on hydra127 via
>> > RUNNING: mpd on hydra127
>> > LAUNCHED mpd on hydra118 via hydra127
>> > RUNNING: mpd on hydra118
>> > Fatal error in MPI_Send: Invalid rank, error stack:
>> > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1,
>> MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed
>> > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative
>> and less than 1
>> > Total Nb of PE: 1
>> >
>> > PE# 0 / 1 OK
>> > PE# 0 0 0 0
>> > PE# 0 0 33 0 165 0 33
>> > PE# 0 -1 1 -1 -1 -1 8
>> > PE_Table, PE# 0 complete
>> > PE# 0 -0.03 0.98 -1.00 1.00 -0.03 0.98
>> > PE# 0 doesn t intersect any bloc
>> > PE# 0 will communicate with 0
>> > single value
>> > PE# 0 has 2 com. boundaries
>> > Data_Read, PE# 0 complete
>> >
>> > PE# 0 checking boundary type for
>> > 0 1 1 1 0 165 0 33 nor sur sur sur gra 1 0 0
>> > 0 2 33 33 0 165 0 33 EXC -> 1
>> > 0 3 0 33 1 1 0 33 sur nor sur sur gra 0 1 0
>> > 0 4 0 33 164 164 0 33 sur nor sur sur gra 0 -1 0
>> > 0 5 0 33 0 165 1 1 cyc cyc cyc sur cyc 0 0 1
>> > 0 6 0 33 0 165 33 33 EXC -> 8
>> > PE# 0 Set new
>> > PE# 0 FFT Table
>> > PE# 0 Coeff
>> > rank 0 in job 1 hydra127_34565 caused collective abort of all
>> ranks
>> > exit status of rank 0: return code 1
>> >
>> > I am struggling to find the error in my code. can anybody suggest
>> me where I messed up.
>> >
>> > Thanks and Regards,
>> > Ash _______________________________________________
>> > mpich-discuss mailing list
>> > mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list