[mpich-discuss] HP-XC 3000 cluster issues

Gauri Kulkarni gaurivk at gmail.com
Tue Mar 3 11:53:50 CST 2009


I have been reading up and aware that MPICH2 is the currently used
implementation. And I would have it here in our system as well, but the
caveat is: we have a software named FastDL which is a parallelized version
of IDL (by ITT). When it was ordered, it was compiled with MPICH1.2.7p1.
Hence we had been trying to get MPICH1 up and running. It is possible to
obtain the FastDL that would be compiled with MPICH2 but the problem of LSF
still remains. The jobs cannot be spawned to different nodes of the cluster
as a user since users cannot ssh into different nodes. They can only be
spawned through LSF and the LSF here has been configured with SLURM. I have
been under impression that this is particular to HP-XC clusters.

As far as interactive or debugging session goes, I do not know how to do
srun in interactive session. How do you do that?

Gauri.
---------


On Tue, Mar 3, 2009 at 10:37 PM, Anthony Chan <chan at mcs.anl.gov> wrote:

>
> One advantage of MPICH2 over MPICH1 is that MPICH2 is a lot more robust
> in term of process management.  So debugging MPICH2 app with MPICH1 is
> easier.  Also, MPICH1 is no longer being developed, if you have any prolem
> with MPICH1, less people (if there is any) will be able to help you.
>
> A.Chan
>
> ----- "Gauri Kulkarni" <gaurivk at gmail.com> wrote:
>
> > Thanks, Rajeev.
> >
> > Is it the same case with MPICH1? The reason I need info about MPICH1
> > with
> > SLURM is because we have a software (FastDL) which has been compiled
> > with
> > MPICH1. We have asked the vendor to give us the software recompiled
> > with
> > MPICH2, but honestly, we do not know any particular advantage using
> > MPICH2
> > over MPICH1 (apart from the fact that MPICH1 isn't maintained
> > anymore).
> >
> > On a sidenote, how do I reply to the thread? I only get the daily
> > digest in
> > mail.
> >
> > -Gauri.
> > ----------
> >
> >
> >
> > -------------
> >
> > Message: 4
> > Date: Mon, 23 Feb 2009 23:38:06 -0600
> > From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> > Subject: Re: [mpich-discuss] HP-XC 3000 cluster issues
> > To: <mpich-discuss at mcs.anl.gov>
> > Message-ID: <72376B2D10EC43F9A0A433C960F951B6 at thakurlaptop>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > To run MPICH2 with SLURM, configure with the options
> > "--with-pmi=slurm
> > --with-pm=no" as described in the MPICH2 README file. Also see the
> > instructions on how to run MPICH2 with SLURM at
> > https://computing.llnl.gov/linux/slurm/quickstart.html .
> >
> > Rajeev
> >
> >
> >
> >  _____
> >
> > From: mpich-discuss-bounces at mcs.anl.gov
> > [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gauri
> > Kulkarni
> > Sent: Monday, February 23, 2009 11:19 PM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [mpich-discuss] HP-XC 3000 cluster issues
> >
> >
> > Hi,
> >
> > I am a newbie to the MPI in general. Currently in our institute, we
> > have a
> > cluster of 16nodes-8processors. It is an HP-XC 3000 cluster which
> > basically
> > means, it's quite proprietary. It has its own MPI implementation -
> > HP-MPI -
> > in which, the parallelization is managed by SLURM (Simple Linux
> > Utility for
> > Resource Management). There is also a batch job scheduler - LSF (Load
> > Sharing Facility) which works in tandem with SLURM to parallelize the
> > batch
> > jobs. We have installed both MPICH and MPICH2 and are testing it, but
> > we are
> > running into compatibility issues. For a simple helloworld.c program:
> > 1. For HPMPI: Compiled with mpicc of this implemetation and executed
> > with
> > its mpirun: mpirun -np 4 helloworld works correctly. For batch
> > scheduling,
> > we need to isse "bsub -n4 [other options] mpirun -srun helloworld" and
> > it
> > runs fine too. "srun" is SLURM utility that parallelizes the jobs.
> > 2. For MPICH and mPICH2: Again, compiled with mpicc of these
> > respective
> > implemetations and executed with their own mpirun:
> >    i) mpirun -np 4 helloword : Works.
> >   ii) mpirun -np 15 helloworld: The parallelization is limited to just
> > a
> > single node - that is 8 processes run first on 8 processors of a
> > single node
> > and then remaining ones.
> >  iii) bsub -n4 [options] mpirun -srun hellowrold: Job terminated.
> > srun
> > option not recognized.
> >   iv) bsub [options] mpirun -np 4 helloworld: Works
> >   V) bsub [options] mpirun -np 15 helloworld: (Same as iii)
> >
> > Anybody aware of HP cluster issues with MPICH? Am I misinterpreting?
> > Any
> > help is appreciated.
> >
> > Gauri.
> > ---------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090303/1b3cbdf0/attachment.htm>


More information about the mpich-discuss mailing list