<div dir="ltr">Ok.<br><br> I'll look into the code for MPI_Abort.<br><br>Thank you,<br>Sangamesh<br><br><div class="gmail_quote">On Thu, Sep 4, 2008 at 12:42 PM, Pavan Balaji <span dir="ltr"><<a href="mailto:balaji@mcs.anl.gov">balaji@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Sangamesh,<br>
<br>
It is the application that is calling MPI_Abort, not the MPI library. The MPI library does not know why the application called an abort, so it can't really give you any more information. You'll need to check the application code to see why it's calling abort.<br>
<br>
-- Pavan<div class="Ih2E3d"><br>
<br>
On 09/04/2008 02:08 AM, Sangamesh B wrote:<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">
Hi,<br>
<br>
There is no much info available regarding the error. I got this code for benchmarking. So the client has mentioned to run it for 48, 96, 128, 192 and 256 processes.<br>
<br>
For each run its giving the same error. May I know is there an option for verbose in mpirun to get more info?<br>
<br>
Thank you,<br>
Sangamesh<br>
<br></div><div class="Ih2E3d">
On Thu, Sep 4, 2008 at 11:48 AM, Pavan Balaji <<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a> <mailto:<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a>>> wrote:<br>
<br>
<br>
I don't quite understand what the problem here is. It looks like the<br>
application is calling MPI_Abort(). MPICH2 kills the processes<br>
belonging to the application, when MPI_Abort() is called. Do you<br>
expect a different behavior?<br>
<br>
-- Pavan<br>
<br>
<br>
On 09/03/2008 11:51 PM, Sangamesh B wrote:<br>
<br>
Hi All,<br>
<br>
I've compiled a home developed C application, with<br>
MPICH2-1.0.7, GNU compilers on Cent OS 5 based Rocks 5 cluster.<br>
<br>
Command used and error are as follows:<br>
<br>
$ /opt/mpich2/gnu/bin/mpirun -machinefile ./mach28 -np 8 ./run3<br></div>
./<a href="http://run3.in" target="_blank">run3.in</a> <<a href="http://run3.in" target="_blank">http://run3.in</a>> <<a href="http://run3.in" target="_blank">http://run3.in</a>> | tee run3_1a_8p<div class="Ih2E3d">
<br>
<br>
<br>
[cli_0]: aborting job:<br>
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0<br>
rank 0 in job 1 locuzcluster.org_44326 caused collective<br>
abort of all ranks<br>
exit status of rank 0: killed by signal 9<br>
<br>
$ ldd run3<br>
libm.so.6 => /lib64/libm.so.6 (0x0000003a1fa00000)<br>
libmpich.so.1.1 => /opt/mpich2/gnu/lib/libmpich.so.1.1<br>
(0x00002aaaaaac4000)<br>
libpthread.so.0 => /lib64/libpthread.so.0<br>
(0x0000003a20200000)<br>
librt.so.1 => /lib64/librt.so.1 (0x0000003a20e00000)<br>
libuuid.so.1 => /lib64/libuuid.so.1 (0x00002aaaaadba000)<br>
libc.so.6 => /lib64/libc.so.6 (0x0000003a1f600000)<br>
/lib64/ld-linux-x86-64.so.2 (0x0000003a1f200000)<br>
<br>
It is recommended to run this job for 48 and 96 process/cores.<br>
But cluster has only 8 cores.<br>
Is this lower no of processes causing the above error?<br>
<br>
Thank you,<br>
Sangamesh<br>
<br>
<br>
-- Pavan Balaji<br></div>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a> <<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/%7Ebalaji</a>><br>
<br>
<br>
</blockquote><div><div></div><div class="Wj3C7c">
<br>
-- <br>
Pavan Balaji<br>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
<br>
</div></div></blockquote></div><br></div>