[mpich-discuss] [cli_0]: aborting job:

Pavan Balaji balaji at mcs.anl.gov
Thu Sep 4 01:18:44 CDT 2008


I don't quite understand what the problem here is. It looks like the 
application is calling MPI_Abort(). MPICH2 kills the processes belonging 
to the application, when MPI_Abort() is called. Do you expect a 
different behavior?

  -- Pavan

On 09/03/2008 11:51 PM, Sangamesh B wrote:
> Hi All,
> 
>    I've compiled a home developed C application, with MPICH2-1.0.7, GNU 
> compilers on Cent OS 5 based  Rocks 5 cluster.
> 
> Command used and error are as follows:
> 
> $ /opt/mpich2/gnu/bin/mpirun -machinefile ./mach28 -np 8 ./run3 
> ./run3.in <http://run3.in> | tee run3_1a_8p
> 
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> rank 0 in job 1  locuzcluster.org_44326   caused collective abort of all 
> ranks
>   exit status of rank 0: killed by signal 9
> 
> $ ldd run3
>         libm.so.6 => /lib64/libm.so.6 (0x0000003a1fa00000)
>         libmpich.so.1.1 => /opt/mpich2/gnu/lib/libmpich.so.1.1 
> (0x00002aaaaaac4000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a20200000)
>         librt.so.1 => /lib64/librt.so.1 (0x0000003a20e00000)
>         libuuid.so.1 => /lib64/libuuid.so.1 (0x00002aaaaadba000)
>         libc.so.6 => /lib64/libc.so.6 (0x0000003a1f600000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000003a1f200000)
> 
> It is recommended to run this job for 48 and 96 process/cores. But 
> cluster has only 8 cores.
> Is this lower no of processes causing the above error?
> 
> Thank you,
> Sangamesh

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji




More information about the mpich-discuss mailing list