[mpich-discuss] [cli_0]: aborting job:
Pavan Balaji
balaji at mcs.anl.gov
Thu Sep 4 01:18:44 CDT 2008
I don't quite understand what the problem here is. It looks like the
application is calling MPI_Abort(). MPICH2 kills the processes belonging
to the application, when MPI_Abort() is called. Do you expect a
different behavior?
-- Pavan
On 09/03/2008 11:51 PM, Sangamesh B wrote:
> Hi All,
>
> I've compiled a home developed C application, with MPICH2-1.0.7, GNU
> compilers on Cent OS 5 based Rocks 5 cluster.
>
> Command used and error are as follows:
>
> $ /opt/mpich2/gnu/bin/mpirun -machinefile ./mach28 -np 8 ./run3
> ./run3.in <http://run3.in> | tee run3_1a_8p
>
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> rank 0 in job 1 locuzcluster.org_44326 caused collective abort of all
> ranks
> exit status of rank 0: killed by signal 9
>
> $ ldd run3
> libm.so.6 => /lib64/libm.so.6 (0x0000003a1fa00000)
> libmpich.so.1.1 => /opt/mpich2/gnu/lib/libmpich.so.1.1
> (0x00002aaaaaac4000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a20200000)
> librt.so.1 => /lib64/librt.so.1 (0x0000003a20e00000)
> libuuid.so.1 => /lib64/libuuid.so.1 (0x00002aaaaadba000)
> libc.so.6 => /lib64/libc.so.6 (0x0000003a1f600000)
> /lib64/ld-linux-x86-64.so.2 (0x0000003a1f200000)
>
> It is recommended to run this job for 48 and 96 process/cores. But
> cluster has only 8 cores.
> Is this lower no of processes causing the above error?
>
> Thank you,
> Sangamesh
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list