[MPICH] How do I get the communicator of the spawned group in the spawnee?

Rajeev Thakur thakur at mcs.anl.gov
Tue Jul 5 11:44:48 CDT 2005


David,
  
Is 
> MPICH an open source project?  I mean if I changed the code 
> to be "fault tolerent" in this situation would you consider 
> adding the changes to the code base? Is there a way to allow 
> this kind of behavior within the bounds of the "official" 2.0 
> standard?  The two behavours I "need" are:
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of David Minor
> Sent: Tuesday, July 05, 2005 12:05 AM
> To: 'mpich-discuss at mcs.anl.gov'
> Subject: RE: [MPICH] How do I get the communicator of the 
> spawned group in the spawnee?
> 
> Hello Rajeev,
>     It's interesting that if I send signal -9 to any one of 
> the children after calling disconnect on the comworld of the 
> parent, all the children die gracefully, the parent remains 
> alive and I can restart the children, I tried this many times 
> and didn't see any problems, so it seems like the 
> infrastructure is in place to handle this kind of thing. Is 
> MPICH an open source project?  I mean if I changed the code 
> to be "fault tolerent" in this situation would you consider 
> adding the changes to the code base? Is there a way to allow 
> this kind of behavior within the bounds of the "official" 2.0 
> standard?  The two behavours I "need" are:
>  
> 1) Ability to kill and restart all the children without 
> affecting the parent (this is in case a child goes into a 
> near infinite loop on an algorithm).
> 2) That if one child dies all the children will die without 
> affecting the parent.
>  
> Since our application runs user code not under our control 
> these are "essential" features for us.  Unfortunetly windows 
> compatibility is another "essential" feature so we are 
> somewhat limited in our choice of MPI implementations. 
>  
> Regards,
> David
> 
> 	-----Original Message-----
> 	From: Rajeev Thakur [mailto:thakur at mcs.anl.gov]
> 	Sent: Monday, July 04, 2005 6:58 PM
> 	To: David Minor
> 	Subject: RE: [MPICH] How do I get the communicator of 
> the spawned group in the spawnee?
> 	
> 	
> 	David,
> 	         The communicator passed to MPI_Abort must be a 
> valid communicator on the process calling MPI_Abort. 
> Therefore, you cannot abort only the child. However, a child 
> could die on its own, and one would like this case to be 
> handled gracefully, without taking down the parent. This is 
> up to the implementation to handle. A "fault tolerant" 
> implementation will try to do this. MPICH2 doesn't support it 
> yet, but we hope to do it sometime in the future.
> 	 
> 	Rajeev
> 	 
> 
> 
> ________________________________
> 
> 		From: David Minor [mailto:david-m at orbotech.com] 
> 		Sent: Monday, July 04, 2005 12:11 AM
> 		To: 'Rajeev Thakur'
> 		Subject: RE: [MPICH] How do I get the 
> communicator of the spawned group in the spawnee?
> 		
> 		
> 		Hello Rajeev,
> 		Using the intercommunicator I can communicate 
> with the spawned processes, but I cannot call an abort on 
> them without aborting the parent. I would like for the 
> spawned proceeese to be able to crash, or be aborted, without 
> crashing the parent process, which could then spawn them 
> again. I thought that if the parent process could get a 
> communicator only to the spawned processes I'd be able to do this. 
> 		David
> 
> 			-----Original Message-----
> 			From: Rajeev Thakur [mailto:thakur at mcs.anl.gov]
> 			Sent: Sunday, July 03, 2005 6:48 PM
> 			To: David Minor; mpich-discuss at mcs.anl.gov
> 			Subject: RE: [MPICH] How do I get the 
> communicator of the spawned group in the spawnee?
> 			
> 			
> 			The intercommunicator returned by 
> MPI_Comm_spawn is the one you are looking for.
> 			 
> 			MPI_Comm_get_parent on the spawned 
> processes returns an intercommunicator that has the spawned 
> processes in one group and the parent processes in the other 
> group. MPI_Comm_spawn on the spawnees returns the same 
> intercommunicator, which can be used for communication with 
> the spawned processes.
> 			 
> 			Rajeev
> 			 
> 
> 
> ________________________________
> 
> 				From: 
> owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of David Minor
> 				Sent: Sunday, July 03, 2005 9:15 AM
> 				To: mpich-discuss at mcs.anl.gov
> 				Subject: [MPICH] How do I get 
> the communicator of the spawned group in the spawnee?
> 				
> 				
> 
> 				Hello List, 
> 
> 				intercomm.Get_parent() from the 
> spawned processes returns me the communicator of the spawnee, 
> but how do I get the communicator of the
> 
> 				spawned processes from the 
> spawnee?  intercomm.Get_remote_group() returns me the group, 
> but how do I get the communicator?
> 
> 				Thanks, 
> 				David Minor 
> 
> 




More information about the mpich-discuss mailing list