[mpich-discuss] Suspend jobs that use MPICH2 with Hydra
Reuti
reuti at staff.uni-marburg.de
Mon Jun 11 15:42:23 CDT 2012
Am 11.06.2012 um 21:31 schrieb Shan-ho Tsai:
> Thanks for your response! We are using Univa Grid Engine 8.0.1p4.
> Is the patch freely available?
Ask the vendor, as it's commercial software.
There was a discussion some time ago about it, but suspending slave tasks never made into any release:
https://arc.liv.ac.uk/trac/SGE/ticket/577
-- Reuti
> Thanks so much,
> Shan-Ho
>
> ----------------------------------------------------
> Shan-Ho Tsai
> University of Georgia, Athens GA
>
> ________________________________________
> From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] on behalf of Reuti [reuti at staff.uni-marburg.de]
> Sent: Monday, June 11, 2012 12:47 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Suspend jobs that use MPICH2 with Hydra
>
> Am 07.05.2012 um 14:57 schrieb Shan-ho Tsai:
>
>> Pavan, thank you so much for creating a ticket to include this
>> support to Hydra. I really appreciate it.
>>
>> Ju, thank you very much for your suggestion. We currently use
>> a variant of SGE as our job scheduler. However, when we suspend
>> an MPICH2/Hydra job, the master process and the slave processes
>> that are on the same host as the master get suspended, but the
>> slave processes on other hosts continue to run (they do not get
>> suspended). If someone is aware of a way to get SGE to suspend all
>> processes properly in such a case, I would appreciate hearing how
>> that is done.
>
> Which version of SGE are you using? There was only a minimal patch necessary to suspend also slave tasks on other nodes IIRC.
>
> -- Reuti
>
>
>> Thank you very much again!
>> Shan-Ho
>>
>> ----------------------------------------------------
>> Shan-Ho Tsai
>> University of Georgia, Athens GA
>>
>> From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] on behalf of Ju JiaJia [jujj603 at gmail.com]
>> Sent: Friday, May 04, 2012 9:37 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Suspend jobs that use MPICH2 with Hydra
>>
>> I think you can use a resource manager and scheduler to do this, like torque + maui. You can suspend and resume jobs.
>>
>> On Sat, May 5, 2012 at 8:46 AM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>> Hello,
>>
>> We don't support this right now. I've created a ticket for it.
>>
>> https://trac.mcs.anl.gov/projects/mpich2/ticket/1627
>>
>> Please add yourself to the cc list of this ticket, if you'd like to be informed about updates on this issue.
>>
>> -- Pavan
>>
>>
>> On 05/04/2012 12:54 PM, Shan-ho Tsai wrote:
>> Hello all,
>> We have mpich2 1.4.1p1 installed on a RHEL5 cluster
>> and sometimes have the need to suspend all jobs clusterwide.
>>
>> Is there a way to suspend MPICH2 jobs that use Hydra, in
>> such a way that the master process and all slave process
>> (on multiple nodes) get properly suspended?
>>
>> If there is a way to do this, what is the procedure? Is there
>> a signal that we could send to mpiexec?
>>
>> I tried sending a SIGSTOP to mpiexec, but only mpiexec
>> got suspended, the actual a.out processes continued to run.
>>
>> I really appreciate any suggestions.
>> thank you,
>> Shan-Ho
>>
>> ----------------------------------------------------
>> Shan-Ho Tsai
>> University of Georgia, Athens GA
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list