<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE>RE: [mpich-discuss] spawning processes with MPI_Comm_spawn: Error inMPI_Finalize</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2> Hi,<BR>
Try out the latest stable release of MPICH2 (1.0.8 - available at <A HREF="http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads">http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads</A>). The default channel in 1.0.8 is nemesis which will provide you with better performance than ssm (& uses shared mem for comm across local procs & tcp for comm across non-local procs like ssm).<BR>
<BR>
Regards,<BR>
Jayesh<BR>
<BR>
-----Original Message-----<BR>
From: mpich-discuss-bounces@mcs.anl.gov [<A HREF="mailto:mpich-discuss-bounces@mcs.anl.gov">mailto:mpich-discuss-bounces@mcs.anl.gov</A>] On Behalf Of Jitendra Kumar<BR>
Sent: Monday, February 09, 2009 9:29 AM<BR>
To: mpich-discuss@mcs.anl.gov<BR>
Subject: [mpich-discuss] spawning processes with MPI_Comm_spawn: Error inMPI_Finalize<BR>
<BR>
Hi,<BR>
I am trying to start slave process in my code using MPI_Comm_spawn. The process starts and runs fine except it fails at MPI_Finalize stage. I am including the excerpts from the parent and slave codes for launching and terminating the process. I am using several MPI_Send/Recv calls using the inter communicator which are getting completed properly.<BR>
<BR>
Parent program:<BR>
MPI_Info_create(&hostinfo);<BR>
MPI_Info_set(hostinfo, "file", "machinefile"); error = MPI_Comm_spawn(command, arg, spawn_size, hostinfo, 0, MPI_COMM_SELF, &slaveworld, MPI_ERRCODES_IGNORE);<BR>
-------<BR>
-------<BR>
MPI_Comm_free(&slaveworld);<BR>
MPI_FInalize();<BR>
<BR>
<BR>
Slave program (get the parent communicator):<BR>
MPI_Comm_get_parent(&parentcomm);<BR>
---------------<BR>
---------------<BR>
MPI_Comm_free(&parentcomm);<BR>
MPI_Finalize();<BR>
<BR>
I get following errors at the end...<BR>
<BR>
rank 0 in job 2674 master_4268 caused collective abort of all ranks<BR>
exit status of rank 0: killed by signal 9<BR>
0,1: [cli_0]: aborting job:<BR>
0,1: Fatal error in MPI_Finalize: Other MPI error, error stack:<BR>
0,1: MPI_Finalize(220).........................: MPI_Finalize failed<BR>
0,1: MPI_Finalize(146).........................:<BR>
0,1: MPID_Finalize(206)........................: an error occurred while<BR>
the devicewas waiting for all open connections to close<BR>
0,1: MPIDI_CH3I_Progress(161)..................: handle_sock_op failed<BR>
0,1: MPIDI_CH3I_Progress_handle_sock_event(175):<BR>
0,1: MPIDU_Socki_handle_read(649)..............: connection failure<BR>
(set=0,sock=1,errno=104:(strerror() not found))<BR>
rank 0 in job 9 node2_32773 caused collective abort of all ranks<BR>
exit status of rank 0: killed by signal 9<BR>
<BR>
mpich2version:<BR>
Version: 1.0.3<BR>
Device: ch3:ssm<BR>
Configure Options: --with-device=ch3:ssm --enable-f77 --enable-f90<BR>
--enable-cxx --prefix=/usr/local/mpich2-1.0.3-pathscale-k8<BR>
<BR>
Am I doing something wrong in releasing the communicators. I even tried<BR>
using MPI_Comm_disconnect in place of MPI_Comm_free in both parent and<BR>
slave codes, but the same error. Any pointers to the problem would be of<BR>
great help.<BR>
<BR>
Thanks,<BR>
Jitu<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>