[mpich-discuss] MPICH2 and TORQUE/PBS integration

Ivan Pulido mefistofeles87 at gmail.com
Wed Jul 28 09:53:28 CDT 2010


On Wed, Jul 28, 2010 at 3:12 AM, Nicolas Rosner <nrosner at gmail.com> wrote:

> Hi Ivan and all,
>
> We use MPICH2 (in user space) on a cluster that runs Torque/PBS (as
> provided by root).
>
> I never really managed to properly "integrate" the two (I'm not sure
> there's even a standard way to do that -- e.g. even if you were to use
> MPI2 spawn et al for dynamic proc mgmt, I suppose you'd still be
> trapped within the MPD-supplied MPI world, no?).
>
> But, frankly, so far I've had no real need for such a thing. So what I
> do is this: my job desc files (the .pbs text file, or whatever you'll
> qsub) contain
>
> 1) a pipeline similar to the one Camilo described
>
> 2) commands that ensure no old forgotten mpd processes remain out
> there (it's a !@$ when your whole job dies after days waiting because
> a ring failed to boot!)
>
> 3) commands that ensure a new clean mpd ring gets booted properly
> w/the right args according to what we parsed in 1), etc.
>
> 4)   # put your favorite mpiexec here
>
> 5) mpdallexit.
>
> That seems to work quite well, at least for my needs.
>
> Cheers,
> N.
>
>
> PS: Hydra works like a charm on our 3-PC testing "minicluster" at the
> office (I really enjoy forgetting about the mpd ring drill
> altogether!) but I couldn't get it to stop choking on some dns quirk
> of the real cluster (where, alas, no root), so I'm still using mpd
> there. If you're interested in some wrapper scripts (just hacks, but
> they do the job), do let me know.
>
>
>
Right now I moved from using mpd to hydra and has been working fine, it's
still on testing phase, but if everything goes fine I find it a good
solution since it's powerful and you don't have to mess with mpd's ring.
Thanks a lot for your help.


>
>  .pbs jobspecs (the text files that I qsub) usually contain something
> similar to what Camilo mentioned
>
>
>
>
>
> On Mon, Jul 26, 2010 at 11:44 AM, Ivan Pulido <mefistofeles87 at gmail.com>
> wrote:
> >
> >
> > On Fri, Jul 23, 2010 at 6:24 PM, Pavan Balaji <balaji at mcs.anl.gov>
> wrote:
> >>
> >> Ivan,
> >>
> >> Can you try using the Hydra process manager?
> >>
> >> % mpiexec.hydra -rmk pbs ./application
> >>
> >
> > This didn't work, I'm not sure if this has to be with the way I've set up
> my
> > cluster. When I try running that command specifying 20 nodes (-n 20) all
> the
> > jobs are run on a single machine and the PBS server doesn't find out
> about
> > this application running (qstat doesn't shopw anything). Any ideas about
> > this subject are very welcome.
> >
> > Thanks.
> >
> >>
> >>  -- Pavan
> >>
> >> On 07/23/2010 05:15 PM, Ivan Pulido wrote:
> >>>
> >>> Hello, I'm trying to configure torque resource manager and MPICH2 (with
> >>> MPD) but Im having some issues.
> >>>
> >>> The MPICH2 user's guide says there's a way to convert the Torque node
> >>> file to one MPD can read, but this is outdated since the syntax used by
> >>> torque nowadays is not the one mentioned on MPICH2 user's guide, so I
> can't
> >>> use what's there to use Torque with MPICH2. On the other hand, I tried
> using
> >>> OSC mpiexec http://www.osc.edu/~djohnson/mpiexec/ with no good results
> since
> >>> it's looking for a libpbs.a that's not part of Torque default install
> (this
> >>> is for torque's mailling list).
> >>>
> >>> So, what I'm trying to tell is that the ways the user's guide advice to
> >>> use MPICH2 with torque functionality are not correct with newest
> versions of
> >>> the software involved. So I'd like to know if there's a way to use
> MPICH2
> >>> with torque functionality that really works with newest versions, I'd
> really
> >>> like a help with this since we need using MPI in our cluster urgently.
> >>>
> >>> Thanks.
> >>>
> >>> --
> >>> Ivan Pulido
> >>> Estudiante de Física
> >>> Universidad Nacional de Colombia
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> mpich-discuss mailing list
> >>> mpich-discuss at mcs.anl.gov
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>
> >> --
> >> Pavan Balaji
> >> http://www.mcs.anl.gov/~balaji
> >> _______________________________________________
> >> mpich-discuss mailing list
> >> mpich-discuss at mcs.anl.gov
> >> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> > --
> > Ivan Pulido
> > Estudiante de Física
> > Universidad Nacional de Colombia
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



-- 
Ivan Pulido
Estudiante de Física
Universidad Nacional de Colombia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100728/50405c16/attachment.htm>


More information about the mpich-discuss mailing list