[mpich-discuss] MPICH2 and TORQUE/PBS integration

Pavan Balaji balaji at mcs.anl.gov
Sun Aug 1 14:39:42 CDT 2010


Sorry for the delay in getting to this chain.

What exactly is the problem you are facing with integrating Hydra with 
Torque/PBS? You should just be able to specify:

% mpiexec -rmk pbs ./foo

.. and mpiexec should automatically query for the appropriate 
information from Torque. Note that the mpiexec line needs to be inside 
your qsub script.

  -- Pavan

On 07/28/2010 09:53 AM, Ivan Pulido wrote:
>
>
> On Wed, Jul 28, 2010 at 3:12 AM, Nicolas Rosner <nrosner at gmail.com
> <mailto:nrosner at gmail.com>> wrote:
>
>     Hi Ivan and all,
>
>     We use MPICH2 (in user space) on a cluster that runs Torque/PBS (as
>     provided by root).
>
>     I never really managed to properly "integrate" the two (I'm not sure
>     there's even a standard way to do that -- e.g. even if you were to use
>     MPI2 spawn et al for dynamic proc mgmt, I suppose you'd still be
>     trapped within the MPD-supplied MPI world, no?).
>
>     But, frankly, so far I've had no real need for such a thing. So what I
>     do is this: my job desc files (the .pbs text file, or whatever you'll
>     qsub) contain
>
>     1) a pipeline similar to the one Camilo described
>
>     2) commands that ensure no old forgotten mpd processes remain out
>     there (it's a !@$ when your whole job dies after days waiting because
>     a ring failed to boot!)
>
>     3) commands that ensure a new clean mpd ring gets booted properly
>     w/the right args according to what we parsed in 1), etc.
>
>     4)   # put your favorite mpiexec here
>
>     5) mpdallexit.
>
>     That seems to work quite well, at least for my needs.
>
>     Cheers,
>     N.
>
>
>     PS: Hydra works like a charm on our 3-PC testing "minicluster" at the
>     office (I really enjoy forgetting about the mpd ring drill
>     altogether!) but I couldn't get it to stop choking on some dns quirk
>     of the real cluster (where, alas, no root), so I'm still using mpd
>     there. If you're interested in some wrapper scripts (just hacks, but
>     they do the job), do let me know.
>
>
>
> Right now I moved from using mpd to hydra and has been working fine,
> it's still on testing phase, but if everything goes fine I find it a
> good solution since it's powerful and you don't have to mess with mpd's
> ring. Thanks a lot for your help.
>
>
>       .pbs jobspecs (the text files that I qsub) usually contain something
>     similar to what Camilo mentioned
>
>
>
>
>
>     On Mon, Jul 26, 2010 at 11:44 AM, Ivan Pulido
>     <mefistofeles87 at gmail.com <mailto:mefistofeles87 at gmail.com>> wrote:
>      >
>      >
>      > On Fri, Jul 23, 2010 at 6:24 PM, Pavan Balaji <balaji at mcs.anl.gov
>     <mailto:balaji at mcs.anl.gov>> wrote:
>      >>
>      >> Ivan,
>      >>
>      >> Can you try using the Hydra process manager?
>      >>
>      >> % mpiexec.hydra -rmk pbs ./application
>      >>
>      >
>      > This didn't work, I'm not sure if this has to be with the way
>     I've set up my
>      > cluster. When I try running that command specifying 20 nodes (-n
>     20) all the
>      > jobs are run on a single machine and the PBS server doesn't find
>     out about
>      > this application running (qstat doesn't shopw anything). Any
>     ideas about
>      > this subject are very welcome.
>      >
>      > Thanks.
>      >
>      >>
>      >>  -- Pavan
>      >>
>      >> On 07/23/2010 05:15 PM, Ivan Pulido wrote:
>      >>>
>      >>> Hello, I'm trying to configure torque resource manager and
>     MPICH2 (with
>      >>> MPD) but Im having some issues.
>      >>>
>      >>> The MPICH2 user's guide says there's a way to convert the
>     Torque node
>      >>> file to one MPD can read, but this is outdated since the syntax
>     used by
>      >>> torque nowadays is not the one mentioned on MPICH2 user's
>     guide, so I can't
>      >>> use what's there to use Torque with MPICH2. On the other hand,
>     I tried using
>      >>> OSC mpiexec http://www.osc.edu/~djohnson/mpiexec/ with no good
>     results since
>      >>> it's looking for a libpbs.a that's not part of Torque default
>     install (this
>      >>> is for torque's mailling list).
>      >>>
>      >>> So, what I'm trying to tell is that the ways the user's guide
>     advice to
>      >>> use MPICH2 with torque functionality are not correct with
>     newest versions of
>      >>> the software involved. So I'd like to know if there's a way to
>     use MPICH2
>      >>> with torque functionality that really works with newest
>     versions, I'd really
>      >>> like a help with this since we need using MPI in our cluster
>     urgently.
>      >>>
>      >>> Thanks.
>      >>>
>      >>> --
>      >>> Ivan Pulido
>      >>> Estudiante de Física
>      >>> Universidad Nacional de Colombia
>      >>>
>      >>>
>      >>>
>     ------------------------------------------------------------------------
>      >>>
>      >>> _______________________________________________
>      >>> mpich-discuss mailing list
>      >>> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>      >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>      >>
>      >> --
>      >> Pavan Balaji
>      >> http://www.mcs.anl.gov/~balaji
>      >> _______________________________________________
>      >> mpich-discuss mailing list
>      >> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>      >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>      >
>      >
>      >
>      > --
>      > Ivan Pulido
>      > Estudiante de Física
>      > Universidad Nacional de Colombia
>      >
>      > _______________________________________________
>      > mpich-discuss mailing list
>      > mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>      > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>      >
>      >
>     _______________________________________________
>     mpich-discuss mailing list
>     mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>     https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
>
> --
> Ivan Pulido
> Estudiante de Física
> Universidad Nacional de Colombia
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list