[mpich-discuss] Suspend jobs that use MPICH2 with Hydra

Reuti reuti at staff.uni-marburg.de
Mon Jun 11 15:42:23 CDT 2012


Am 11.06.2012 um 21:31 schrieb Shan-ho Tsai:

> Thanks for your response! We are using Univa Grid Engine 8.0.1p4.
> Is the patch freely available?

Ask the vendor, as it's commercial software.

There was a discussion some time ago about it, but suspending slave tasks never made into any release:

https://arc.liv.ac.uk/trac/SGE/ticket/577

-- Reuti


> Thanks so much,
> Shan-Ho
> 
> ----------------------------------------------------
> Shan-Ho Tsai
> University of Georgia, Athens GA
> 
> ________________________________________
> From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] on behalf of Reuti [reuti at staff.uni-marburg.de]
> Sent: Monday, June 11, 2012 12:47 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Suspend jobs that use MPICH2 with Hydra
> 
> Am 07.05.2012 um 14:57 schrieb Shan-ho Tsai:
> 
>> Pavan, thank you so much for creating a ticket to include this
>> support to Hydra. I really appreciate it.
>> 
>> Ju, thank you very much for your suggestion. We currently use
>> a variant of SGE as our job scheduler. However,  when we suspend
>> an MPICH2/Hydra job, the master process and the slave processes
>> that are on the same host as the master get suspended, but the
>> slave processes on other hosts continue to run (they do not get
>> suspended). If someone is aware of a way to get SGE to suspend all
>> processes properly in such a case, I would appreciate hearing how
>> that is done.
> 
> Which version of SGE are you using? There was only a minimal patch necessary to suspend also slave tasks on other nodes IIRC.
> 
> -- Reuti
> 
> 
>> Thank you very much again!
>> Shan-Ho
>> 
>> ----------------------------------------------------
>> Shan-Ho Tsai
>> University of Georgia, Athens GA
>> 
>> From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] on behalf of Ju JiaJia [jujj603 at gmail.com]
>> Sent: Friday, May 04, 2012 9:37 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Suspend jobs that use MPICH2 with Hydra
>> 
>> I think you can use a resource manager and scheduler to do this, like torque + maui. You can suspend and resume jobs.
>> 
>> On Sat, May 5, 2012 at 8:46 AM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>> Hello,
>> 
>> We don't support this right now.  I've created a ticket for it.
>> 
>> https://trac.mcs.anl.gov/projects/mpich2/ticket/1627
>> 
>> Please add yourself to the cc list of this ticket, if you'd like to be informed about updates on this issue.
>> 
>> -- Pavan
>> 
>> 
>> On 05/04/2012 12:54 PM, Shan-ho Tsai wrote:
>> Hello all,
>> We have mpich2 1.4.1p1 installed on a RHEL5 cluster
>> and sometimes have the need to suspend all jobs clusterwide.
>> 
>> Is there a way to suspend MPICH2 jobs that use Hydra, in
>> such a way that the master process and all slave process
>> (on multiple nodes) get properly suspended?
>> 
>> If there is a way to do this, what is the procedure? Is there
>> a signal that we could send to mpiexec?
>> 
>> I tried sending a SIGSTOP to mpiexec, but only mpiexec
>> got suspended, the actual a.out processes continued to run.
>> 
>> I really appreciate any suggestions.
>> thank you,
>> Shan-Ho
>> 
>> ----------------------------------------------------
>> Shan-Ho Tsai
>> University of Georgia, Athens GA
>> 
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list