[mpich-discuss] MPICH2 and OpenPBS/Torque using MPICH2-Hydra
Si Hammond
simon.hammond at gmail.com
Mon Sep 28 13:50:22 CDT 2009
Hi Pavan,
Might seem a silly question but how do we get this built in with
MPICH2? Is this a case of un-tarring the source within a MPICH2 source
directory and going from there?
Thanks for your help.
S.
On 25 Sep 2009, at 18:06, Pavan Balaji wrote:
>
> Hmm.. I do see your point. Ok, I've modified Hydra to use the number
> of nodes specified to the resource manager, unless the user
> explicitly overrides it with the "-n" option.
>
> A new tarball with basic testing is present here: http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra
>
> I need to do some more cleanup + performance optimization, but you
> can give this one a shot.
>
> -- Pavan
>
> On 09/25/2009 10:54 AM, Simon Hammond wrote:
>> Hi Pavan,
>> Thanks for this, I will give it a go when our nodes have cleared out
>> the Friday rush.
>> On the -n thing we tend to want users to have one script for their
>> code which they submit regardless of the number of processors etc
>> (the
>> script stays the same). The users generally want to select the number
>> of nodes and cores etc via the scheduler so in your example they
>> would
>> just select 10 nodes and not the 100 since 90 would end up not being
>> used. I guess the problem there is they want PBS to "just work" and
>> put their jobs down on the nodes in the way they have described to
>> PBS.
>> I know I keep banging on about it (and its not meant to be a
>> competition) but I think they have got used to an OpenMPI style of
>> runtime where everything seems to just glue together and work. For
>> MPICH2 ideally we'd like some similar functionality because we have
>> codes which *only* work on MPICH2/MVAPICH.
>> Maybe this stuff could be got around with some scripting in the PBS
>> launch script itself but the placement etc would be pretty loose
>> then.
>> What do you think?
>> S.
>> 2009/9/25 Pavan Balaji <balaji at mcs.anl.gov>:
>>> Simon,
>>>
>>> I did an initial version of the code; tarball here:
>>> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra
>>>
>>> However, I'm not sure if skipping the "-n" option is a good idea.
>>> Even if
>>> your allocation has 100 nodes, the application might need to be
>>> launched
>>> only on 10 nodes. The "-f" option, on the other hand, would be
>>> redundant in
>>> PBS environments. You don't need to specify it with PBS anymore.
>>> Also, this
>>> would keep the behavior consistent with other environments such as
>>> slurm.
>>> Thoughts?
>>>
>>> -- Pavan
>>>
>>> On 09/24/2009 03:25 AM, Simon Hammond wrote:
>>>> This would definitely be helpful.
>>>>
>>>> I'm guessing these processes (under SSH) wouldn't be managed in the
>>>> same way that PBS would say an OpenMPI job? Getting rid of the -n
>>>> would be a good start. Can I ask how node placement and rank
>>>> allocation etc would work?
>>>>
>>>>
>>>>
>>>>
>>>> S.
>>>>
>>>>
>>>> 2009/9/23 <balaji at mcs.anl.gov>:
>>>>> PBS support is not added in Hydra yet. See:
>>>>> https://trac.mcs.anl.gov/projects/mpich2/ticket/443
>>>>>
>>>>> However, so far we were considering PBS support as a bootstrap
>>>>> server
>>>>> (which has a lot more functionality than just providing the
>>>>> number of
>>>>> nodes). But adding PBS as a resource management kernel (RMK) is
>>>>> possible too
>>>>> and should be simpler as well. Doing this will allow you to skip
>>>>> the "-n"
>>>>> option in PBS environments, but you'll internally still be using
>>>>> ssh to
>>>>> start the processes. If this is helpful for you, we can consider
>>>>> adding it
>>>>> for the next release.
>>>>>
>>>>> -- Pavan
>>>>>
>>>>> ----- "Si Hammond" <simon.hammond at gmail.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> We're trying to get MPICH2 to work under PBS so that users
>>>>>> don't need
>>>>>>
>>>>>> to specify the number of processors etc. We have built the
>>>>>> install
>>>>>> with the MPICH2-Hydra device and this seems to work (in that
>>>>>> jobs will
>>>>>>
>>>>>> run) but users still have to specify the number of MPI ranks
>>>>>> using
>>>>>> -n.
>>>>>>
>>>>>> I found the -rmk flag in some documentation using Google but "-
>>>>>> rmk
>>>>>> pbs" doesn't seem to work either (the example uses "-rmk lsf")
>>>>>>
>>>>>> How does a user like myself go about making a PBS-enabled
>>>>>> MPICH2 build
>>>>>>
>>>>>> to the job placement etc is all handled under the covers? Is this
>>>>>> possible? OpenMPI seems to have this covered but we'd like to
>>>>>> install
>>>>>>
>>>>>> MPICH2 as well since one of our codes only runs with MPICH2.
>>>>>>
>>>>>> Thanks for your help.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------
>>>>>> Si Hammond
>>>>>>
>>>>>> Performance Modelling, Analysis and Optimisation Laboratory
>>>>>> High Performance Systems Group
>>>>>> Department of Computer Science
>>>>>> University of Warwick, CV4 7AL, UK
>>>>>>
>>>>>> ----------------------------------------------------------------------------------------
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
---------------------------------------------------------------------------------------
Si Hammond
Performance Modelling, Analysis and Optimisation Laboratory
High Performance Systems Group
Department of Computer Science
University of Warwick, CV4 7AL, UK
----------------------------------------------------------------------------------------
More information about the mpich-discuss
mailing list