[mpich-discuss] MPICH2 and TORQUE/PBS integration

Nicolas Rosner nrosner at gmail.com
Wed Jul 28 03:12:49 CDT 2010


Hi Ivan and all,

We use MPICH2 (in user space) on a cluster that runs Torque/PBS (as
provided by root).

I never really managed to properly "integrate" the two (I'm not sure
there's even a standard way to do that -- e.g. even if you were to use
MPI2 spawn et al for dynamic proc mgmt, I suppose you'd still be
trapped within the MPD-supplied MPI world, no?).

But, frankly, so far I've had no real need for such a thing. So what I
do is this: my job desc files (the .pbs text file, or whatever you'll
qsub) contain

1) a pipeline similar to the one Camilo described

2) commands that ensure no old forgotten mpd processes remain out
there (it's a !@$ when your whole job dies after days waiting because
a ring failed to boot!)

3) commands that ensure a new clean mpd ring gets booted properly
w/the right args according to what we parsed in 1), etc.

4)   # put your favorite mpiexec here

5) mpdallexit.

That seems to work quite well, at least for my needs.

Cheers,
N.


PS: Hydra works like a charm on our 3-PC testing "minicluster" at the
office (I really enjoy forgetting about the mpd ring drill
altogether!) but I couldn't get it to stop choking on some dns quirk
of the real cluster (where, alas, no root), so I'm still using mpd
there. If you're interested in some wrapper scripts (just hacks, but
they do the job), do let me know.



 .pbs jobspecs (the text files that I qsub) usually contain something
similar to what Camilo mentioned





On Mon, Jul 26, 2010 at 11:44 AM, Ivan Pulido <mefistofeles87 at gmail.com> wrote:
>
>
> On Fri, Jul 23, 2010 at 6:24 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>
>> Ivan,
>>
>> Can you try using the Hydra process manager?
>>
>> % mpiexec.hydra -rmk pbs ./application
>>
>
> This didn't work, I'm not sure if this has to be with the way I've set up my
> cluster. When I try running that command specifying 20 nodes (-n 20) all the
> jobs are run on a single machine and the PBS server doesn't find out about
> this application running (qstat doesn't shopw anything). Any ideas about
> this subject are very welcome.
>
> Thanks.
>
>>
>>  -- Pavan
>>
>> On 07/23/2010 05:15 PM, Ivan Pulido wrote:
>>>
>>> Hello, I'm trying to configure torque resource manager and MPICH2 (with
>>> MPD) but Im having some issues.
>>>
>>> The MPICH2 user's guide says there's a way to convert the Torque node
>>> file to one MPD can read, but this is outdated since the syntax used by
>>> torque nowadays is not the one mentioned on MPICH2 user's guide, so I can't
>>> use what's there to use Torque with MPICH2. On the other hand, I tried using
>>> OSC mpiexec http://www.osc.edu/~djohnson/mpiexec/ with no good results since
>>> it's looking for a libpbs.a that's not part of Torque default install (this
>>> is for torque's mailling list).
>>>
>>> So, what I'm trying to tell is that the ways the user's guide advice to
>>> use MPICH2 with torque functionality are not correct with newest versions of
>>> the software involved. So I'd like to know if there's a way to use MPICH2
>>> with torque functionality that really works with newest versions, I'd really
>>> like a help with this since we need using MPI in our cluster urgently.
>>>
>>> Thanks.
>>>
>>> --
>>> Ivan Pulido
>>> Estudiante de Física
>>> Universidad Nacional de Colombia
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
> --
> Ivan Pulido
> Estudiante de Física
> Universidad Nacional de Colombia
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>


More information about the mpich-discuss mailing list