[mpich-discuss] FW: Re: FW: [cli_0]: aborting job:
Rajeev Thakur
thakur at mcs.anl.gov
Wed Sep 3 07:53:15 CDT 2008
-----Original Message-----
Date: Wed, 3 Sep 2008 14:54:38 +0530
From: "Sangamesh B" <forum.san at gmail.com>
To: "Rajeev Thakur" <thakur at mcs.anl.gov>
Subject: Re: FW: [cli_0]: aborting job:
Cc: mpich-discuss at mcs.anl.gov
Hi all,
Some more info:
Command used and error are as follows:
$ /opt/mpich2/gnu/bin/mpirun -machinefile ./mach28 -np 8 ./run3 ./run3.in |
tee run3_1a_8p
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
rank 0 in job 1 locuzcluster.org_44326 caused collective abort of all
ranks
exit status of rank 0: killed by signal 9
$ ldd run3
libm.so.6 => /lib64/libm.so.6 (0x0000003a1fa00000)
libmpich.so.1.1 => /opt/mpich2/gnu/lib/libmpich.so.1.1
(0x00002aaaaaac4000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a20200000)
librt.so.1 => /lib64/librt.so.1 (0x0000003a20e00000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00002aaaaadba000)
libc.so.6 => /lib64/libc.so.6 (0x0000003a1f600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003a1f200000)
I got this application for benchmarking. It is mentioned that, the job has
to be run for 48 and 96 process/cores. But cluster has only 8 cores.
Is this causing the above error?
Thank you,
Sangamesh
On Tue, Sep 2, 2008 at 9:22 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> -----Original Message-----
> Date: Tue, 2 Sep 2008 21:19:15 +0530
> From: "Sangamesh B" <forum.san at gmail.com>
> To: "MPICH ML" <mpich-discuss at mcs.anl.gov>
> Subject: [cli_0]: aborting job:
>
> Hi All,
>
> I've compiled a home developed C application, with MPICH2-1.0.7, GNU
> compilers on Cent OS 5 based Rocks 5 cluster.
>
> When I run it with 4/8 processes it gives following error:
>
> [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1)
> - process 0
>
> It doesn't show much info in the error.
>
> May I know what's happening with it? How to fix it?
>
> Thank you,
> Sangamesh
>
>
More information about the mpich-discuss
mailing list