[mpich-discuss] spawning processes with MPI_Comm_spawn: Error inMPI_Finalize
Jayesh Krishna
jayesh at mcs.anl.gov
Mon Feb 9 09:53:11 CST 2009
Hi,
Try out the latest stable release of MPICH2 (1.0.8 - available at
http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=down
loads). The default channel in 1.0.8 is nemesis which will provide you
with better performance than ssm (& uses shared mem for comm across local
procs & tcp for comm across non-local procs like ssm).
Regards,
Jayesh
-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jitendra Kumar
Sent: Monday, February 09, 2009 9:29 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] spawning processes with MPI_Comm_spawn: Error
inMPI_Finalize
Hi,
I am trying to start slave process in my code using MPI_Comm_spawn. The
process starts and runs fine except it fails at MPI_Finalize stage. I am
including the excerpts from the parent and slave codes for launching and
terminating the process. I am using several MPI_Send/Recv calls using the
inter communicator which are getting completed properly.
Parent program:
MPI_Info_create(&hostinfo);
MPI_Info_set(hostinfo, "file", "machinefile"); error =
MPI_Comm_spawn(command, arg, spawn_size, hostinfo, 0, MPI_COMM_SELF,
&slaveworld, MPI_ERRCODES_IGNORE);
-------
-------
MPI_Comm_free(&slaveworld);
MPI_FInalize();
Slave program (get the parent communicator):
MPI_Comm_get_parent(&parentcomm);
---------------
---------------
MPI_Comm_free(&parentcomm);
MPI_Finalize();
I get following errors at the end...
rank 0 in job 2674 master_4268 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
0,1: [cli_0]: aborting job:
0,1: Fatal error in MPI_Finalize: Other MPI error, error stack:
0,1: MPI_Finalize(220).........................: MPI_Finalize failed
0,1: MPI_Finalize(146).........................:
0,1: MPID_Finalize(206)........................: an error occurred while
the devicewas waiting for all open connections to close
0,1: MPIDI_CH3I_Progress(161)..................: handle_sock_op failed
0,1: MPIDI_CH3I_Progress_handle_sock_event(175):
0,1: MPIDU_Socki_handle_read(649)..............: connection failure
(set=0,sock=1,errno=104:(strerror() not found))
rank 0 in job 9 node2_32773 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
mpich2version:
Version: 1.0.3
Device: ch3:ssm
Configure Options: --with-device=ch3:ssm --enable-f77 --enable-f90
--enable-cxx --prefix=/usr/local/mpich2-1.0.3-pathscale-k8
Am I doing something wrong in releasing the communicators. I even tried
using MPI_Comm_disconnect in place of MPI_Comm_free in both parent and
slave codes, but the same error. Any pointers to the problem would be of
great help.
Thanks,
Jitu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090209/dc203a72/attachment.htm>
More information about the mpich-discuss
mailing list