[mpich-discuss] MPICH2 and OpenPBS/Torque using MPICH2-Hydra

Simon Hammond simon.hammond at gmail.com
Fri Sep 25 10:54:21 CDT 2009


Hi Pavan,

Thanks for this, I will give it a go when our nodes have cleared out
the Friday rush.

On the -n thing we tend to want users to have one script for their
code which they submit regardless of the number of processors etc (the
script stays the same). The users generally want to select the number
of nodes and cores etc via the scheduler so in your example they would
just select 10 nodes and not the 100 since 90 would end up not being
used. I guess the problem there is they want PBS to "just work" and
put their jobs down on the nodes in the way they have described to
PBS.

I know I keep banging on about it (and its not meant to be a
competition) but I think they have got used to an OpenMPI style of
runtime where everything seems to just glue together and work. For
MPICH2 ideally we'd like some similar functionality because we have
codes which *only* work on MPICH2/MVAPICH.

Maybe this stuff could be got around with some scripting in the PBS
launch script itself but the placement etc would be pretty loose then.

What do you think?



S.


2009/9/25 Pavan Balaji <balaji at mcs.anl.gov>:
> Simon,
>
> I did an initial version of the code; tarball here:
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra
>
> However, I'm not sure if skipping the "-n" option is a good idea. Even if
> your allocation has 100 nodes, the application might need to be launched
> only on 10 nodes. The "-f" option, on the other hand, would be redundant in
> PBS environments. You don't need to specify it with PBS anymore. Also, this
> would keep the behavior consistent with other environments such as slurm.
> Thoughts?
>
>  -- Pavan
>
> On 09/24/2009 03:25 AM, Simon Hammond wrote:
>>
>> This would definitely be helpful.
>>
>> I'm guessing these processes (under SSH) wouldn't be managed in the
>> same way that PBS would say an OpenMPI job? Getting rid of the -n
>> would be a good start. Can I ask how node placement and rank
>> allocation etc would work?
>>
>>
>>
>>
>> S.
>>
>>
>> 2009/9/23  <balaji at mcs.anl.gov>:
>>>
>>> PBS support is not added in Hydra yet. See:
>>> https://trac.mcs.anl.gov/projects/mpich2/ticket/443
>>>
>>> However, so far we were considering PBS support as a bootstrap server
>>> (which has a lot more functionality than just providing the number of
>>> nodes). But adding PBS as a resource management kernel (RMK) is possible too
>>> and should be simpler as well. Doing this will allow you to skip the "-n"
>>> option in PBS environments, but you'll internally still be using ssh to
>>> start the processes. If this is helpful for you, we can consider adding it
>>> for the next release.
>>>
>>>  -- Pavan
>>>
>>> ----- "Si Hammond" <simon.hammond at gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We're trying to get MPICH2 to work under PBS so that users don't need
>>>>
>>>> to specify the number of processors etc. We have built the install
>>>> with the MPICH2-Hydra device and this seems to work (in that jobs will
>>>>
>>>> run) but users still have to specify the number of MPI ranks using
>>>> -n.
>>>>
>>>> I found the -rmk flag in some documentation using Google but "-rmk
>>>> pbs" doesn't seem to work either (the example uses "-rmk lsf")
>>>>
>>>> How does a user like myself go about making a PBS-enabled MPICH2 build
>>>>
>>>> to the job placement etc is all handled under the covers? Is this
>>>> possible? OpenMPI seems to have this covered but we'd like to install
>>>>
>>>> MPICH2 as well since one of our codes only runs with MPICH2.
>>>>
>>>> Thanks for your help.
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------
>>>> Si Hammond
>>>>
>>>> Performance Modelling, Analysis and Optimisation Laboratory
>>>> High Performance Systems Group
>>>> Department of Computer Science
>>>> University of Warwick, CV4 7AL, UK
>>>>
>>>> ----------------------------------------------------------------------------------------
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>


More information about the mpich-discuss mailing list