[MPICH] one shot jobs in mpich2?

Reuti reuti at staff.uni-marburg.de
Mon Jun 20 10:12:39 CDT 2005


Hi Ralph,

Ralph M. Butler wrote:
>>Date: Fri, 17 Jun 2005 18:09:39 +0200
>>From: Alexander Spiegel <spiegel at rz.rwth-aachen.de>
>>To: Reuti <reuti at staff.uni-marburg.de>
>>Cc: mpich-discuss at mcs.anl.gov
>>Subject: Re: [MPICH] one shot jobs in mpich2?
>>
>>Hi,
>>
>>Reuti wrote:
>>
>>>I didn't found a way to integrate the mpd method into SGE, as it creates
>>>many new processgroups, which prevents a proper shutdown of a job in
>>>case that you issue a qdel for it.
>>
>>What's about removing the lines with 'setpgrp' in mpd.py and mpdman.py
>>scpripts? Or introducing an environment variable to control the creation
>>of new process groups?
>>
>>I made some small tests and it seems to work. The new processgroups were
>>no more created by mpd.
> 
> 
> Over the years, we have had a variety of differing requests regarding
> the use of setpgrp, setsid, etc.  For example, we have worked closely
> with the managers of the Chiba City cluster here at Argonne on this
> matter on multiple occassions.  mpd is a process management system.
> This may sometimes conflict with use of mpd in an environment where
> another package thinks that it is the  process management system.  One
> reason mpd creates process groups is for the reasons mentioned for SGE
> in a prior email, i.e. as a process management system, mpd wants to be
> able kill the entire process group at once.  However, what mpd views as
> a killable set may differ from someone else's.  Frequently, it is an MPI
> 'rank' and any progeny it may further generate.  (However, mpd makes no
> guarantees because it accepts the fact that users may do their own
> setpgrp/setsid sorts of operations.)
> 
> Having said all that, we could probably use options (perhaps in the
> .mpd.conf file) to determine when mpd does (or does not) use the
> setpgrp/setsid syscalls.  We just need to make sure that we are able to
> balance the conflicting sets of requests so that all needs are met.

thanks for the comment. For the use with SGE there is another thing to 
mention: mpd shouldn't vanish on all the nodes into daemon land. For 
smpd I use the "-d 0", so that the daemons are bound to the started rsh 
(in SGE terms: qrsh) command and it's working fine this way. Is there 
any similar thing for mpd? - Reuti




More information about the mpich-discuss mailing list