[mpich-discuss] MPICH2 and TORQUE/PBS integration
Pavan Balaji
balaji at mcs.anl.gov
Sun Aug 1 14:39:42 CDT 2010
Sorry for the delay in getting to this chain.
What exactly is the problem you are facing with integrating Hydra with
Torque/PBS? You should just be able to specify:
% mpiexec -rmk pbs ./foo
.. and mpiexec should automatically query for the appropriate
information from Torque. Note that the mpiexec line needs to be inside
your qsub script.
-- Pavan
On 07/28/2010 09:53 AM, Ivan Pulido wrote:
>
>
> On Wed, Jul 28, 2010 at 3:12 AM, Nicolas Rosner <nrosner at gmail.com
> <mailto:nrosner at gmail.com>> wrote:
>
> Hi Ivan and all,
>
> We use MPICH2 (in user space) on a cluster that runs Torque/PBS (as
> provided by root).
>
> I never really managed to properly "integrate" the two (I'm not sure
> there's even a standard way to do that -- e.g. even if you were to use
> MPI2 spawn et al for dynamic proc mgmt, I suppose you'd still be
> trapped within the MPD-supplied MPI world, no?).
>
> But, frankly, so far I've had no real need for such a thing. So what I
> do is this: my job desc files (the .pbs text file, or whatever you'll
> qsub) contain
>
> 1) a pipeline similar to the one Camilo described
>
> 2) commands that ensure no old forgotten mpd processes remain out
> there (it's a !@$ when your whole job dies after days waiting because
> a ring failed to boot!)
>
> 3) commands that ensure a new clean mpd ring gets booted properly
> w/the right args according to what we parsed in 1), etc.
>
> 4) # put your favorite mpiexec here
>
> 5) mpdallexit.
>
> That seems to work quite well, at least for my needs.
>
> Cheers,
> N.
>
>
> PS: Hydra works like a charm on our 3-PC testing "minicluster" at the
> office (I really enjoy forgetting about the mpd ring drill
> altogether!) but I couldn't get it to stop choking on some dns quirk
> of the real cluster (where, alas, no root), so I'm still using mpd
> there. If you're interested in some wrapper scripts (just hacks, but
> they do the job), do let me know.
>
>
>
> Right now I moved from using mpd to hydra and has been working fine,
> it's still on testing phase, but if everything goes fine I find it a
> good solution since it's powerful and you don't have to mess with mpd's
> ring. Thanks a lot for your help.
>
>
> .pbs jobspecs (the text files that I qsub) usually contain something
> similar to what Camilo mentioned
>
>
>
>
>
> On Mon, Jul 26, 2010 at 11:44 AM, Ivan Pulido
> <mefistofeles87 at gmail.com <mailto:mefistofeles87 at gmail.com>> wrote:
> >
> >
> > On Fri, Jul 23, 2010 at 6:24 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
> >>
> >> Ivan,
> >>
> >> Can you try using the Hydra process manager?
> >>
> >> % mpiexec.hydra -rmk pbs ./application
> >>
> >
> > This didn't work, I'm not sure if this has to be with the way
> I've set up my
> > cluster. When I try running that command specifying 20 nodes (-n
> 20) all the
> > jobs are run on a single machine and the PBS server doesn't find
> out about
> > this application running (qstat doesn't shopw anything). Any
> ideas about
> > this subject are very welcome.
> >
> > Thanks.
> >
> >>
> >> -- Pavan
> >>
> >> On 07/23/2010 05:15 PM, Ivan Pulido wrote:
> >>>
> >>> Hello, I'm trying to configure torque resource manager and
> MPICH2 (with
> >>> MPD) but Im having some issues.
> >>>
> >>> The MPICH2 user's guide says there's a way to convert the
> Torque node
> >>> file to one MPD can read, but this is outdated since the syntax
> used by
> >>> torque nowadays is not the one mentioned on MPICH2 user's
> guide, so I can't
> >>> use what's there to use Torque with MPICH2. On the other hand,
> I tried using
> >>> OSC mpiexec http://www.osc.edu/~djohnson/mpiexec/ with no good
> results since
> >>> it's looking for a libpbs.a that's not part of Torque default
> install (this
> >>> is for torque's mailling list).
> >>>
> >>> So, what I'm trying to tell is that the ways the user's guide
> advice to
> >>> use MPICH2 with torque functionality are not correct with
> newest versions of
> >>> the software involved. So I'd like to know if there's a way to
> use MPICH2
> >>> with torque functionality that really works with newest
> versions, I'd really
> >>> like a help with this since we need using MPI in our cluster
> urgently.
> >>>
> >>> Thanks.
> >>>
> >>> --
> >>> Ivan Pulido
> >>> Estudiante de Física
> >>> Universidad Nacional de Colombia
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> mpich-discuss mailing list
> >>> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>
> >> --
> >> Pavan Balaji
> >> http://www.mcs.anl.gov/~balaji
> >> _______________________________________________
> >> mpich-discuss mailing list
> >> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> > --
> > Ivan Pulido
> > Estudiante de Física
> > Universidad Nacional de Colombia
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
>
> --
> Ivan Pulido
> Estudiante de Física
> Universidad Nacional de Colombia
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list