[mpich-discuss] HP-XC 3000 cluster issues

Anthony Chan chan at mcs.anl.gov
Tue Mar 3 12:10:45 CST 2009


Have no access to LSF, can't really help you on that.
A search of LSF + interactive + debugging reveals this URL:

https://hpc.cineca.it/docs/HPCUserGuide/151BatchSchedulerLSF

A.Chan

----- "Gauri Kulkarni" <gaurivk at gmail.com> wrote:

> I have been reading up and aware that MPICH2 is the currently used
> implementation. And I would have it here in our system as well, but
> the
> caveat is: we have a software named FastDL which is a parallelized
> version
> of IDL (by ITT). When it was ordered, it was compiled with
> MPICH1.2.7p1.
> Hence we had been trying to get MPICH1 up and running. It is possible
> to
> obtain the FastDL that would be compiled with MPICH2 but the problem
> of LSF
> still remains. The jobs cannot be spawned to different nodes of the
> cluster
> as a user since users cannot ssh into different nodes. They can only
> be
> spawned through LSF and the LSF here has been configured with SLURM. I
> have
> been under impression that this is particular to HP-XC clusters.
> 
> As far as interactive or debugging session goes, I do not know how to
> do
> srun in interactive session. How do you do that?
> 
> Gauri.
> ---------
> 
> 
> On Tue, Mar 3, 2009 at 10:37 PM, Anthony Chan <chan at mcs.anl.gov>
> wrote:
> 
> >
> > One advantage of MPICH2 over MPICH1 is that MPICH2 is a lot more
> robust
> > in term of process management.  So debugging MPICH2 app with MPICH1
> is
> > easier.  Also, MPICH1 is no longer being developed, if you have any
> prolem
> > with MPICH1, less people (if there is any) will be able to help
> you.
> >
> > A.Chan
> >
> > ----- "Gauri Kulkarni" <gaurivk at gmail.com> wrote:
> >
> > > Thanks, Rajeev.
> > >
> > > Is it the same case with MPICH1? The reason I need info about
> MPICH1
> > > with
> > > SLURM is because we have a software (FastDL) which has been
> compiled
> > > with
> > > MPICH1. We have asked the vendor to give us the software
> recompiled
> > > with
> > > MPICH2, but honestly, we do not know any particular advantage
> using
> > > MPICH2
> > > over MPICH1 (apart from the fact that MPICH1 isn't maintained
> > > anymore).
> > >
> > > On a sidenote, how do I reply to the thread? I only get the daily
> > > digest in
> > > mail.
> > >
> > > -Gauri.
> > > ----------
> > >
> > >
> > >
> > > -------------
> > >
> > > Message: 4
> > > Date: Mon, 23 Feb 2009 23:38:06 -0600
> > > From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> > > Subject: Re: [mpich-discuss] HP-XC 3000 cluster issues
> > > To: <mpich-discuss at mcs.anl.gov>
> > > Message-ID: <72376B2D10EC43F9A0A433C960F951B6 at thakurlaptop>
> > > Content-Type: text/plain; charset="us-ascii"
> > >
> > > To run MPICH2 with SLURM, configure with the options
> > > "--with-pmi=slurm
> > > --with-pm=no" as described in the MPICH2 README file. Also see
> the
> > > instructions on how to run MPICH2 with SLURM at
> > > https://computing.llnl.gov/linux/slurm/quickstart.html .
> > >
> > > Rajeev
> > >
> > >
> > >
> > >  _____
> > >
> > > From: mpich-discuss-bounces at mcs.anl.gov
> > > [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gauri
> > > Kulkarni
> > > Sent: Monday, February 23, 2009 11:19 PM
> > > To: mpich-discuss at mcs.anl.gov
> > > Subject: [mpich-discuss] HP-XC 3000 cluster issues
> > >
> > >
> > > Hi,
> > >
> > > I am a newbie to the MPI in general. Currently in our institute,
> we
> > > have a
> > > cluster of 16nodes-8processors. It is an HP-XC 3000 cluster which
> > > basically
> > > means, it's quite proprietary. It has its own MPI implementation
> -
> > > HP-MPI -
> > > in which, the parallelization is managed by SLURM (Simple Linux
> > > Utility for
> > > Resource Management). There is also a batch job scheduler - LSF
> (Load
> > > Sharing Facility) which works in tandem with SLURM to parallelize
> the
> > > batch
> > > jobs. We have installed both MPICH and MPICH2 and are testing it,
> but
> > > we are
> > > running into compatibility issues. For a simple helloworld.c
> program:
> > > 1. For HPMPI: Compiled with mpicc of this implemetation and
> executed
> > > with
> > > its mpirun: mpirun -np 4 helloworld works correctly. For batch
> > > scheduling,
> > > we need to isse "bsub -n4 [other options] mpirun -srun helloworld"
> and
> > > it
> > > runs fine too. "srun" is SLURM utility that parallelizes the
> jobs.
> > > 2. For MPICH and mPICH2: Again, compiled with mpicc of these
> > > respective
> > > implemetations and executed with their own mpirun:
> > >    i) mpirun -np 4 helloword : Works.
> > >   ii) mpirun -np 15 helloworld: The parallelization is limited to
> just
> > > a
> > > single node - that is 8 processes run first on 8 processors of a
> > > single node
> > > and then remaining ones.
> > >  iii) bsub -n4 [options] mpirun -srun hellowrold: Job terminated.
> > > srun
> > > option not recognized.
> > >   iv) bsub [options] mpirun -np 4 helloworld: Works
> > >   V) bsub [options] mpirun -np 15 helloworld: (Same as iii)
> > >
> > > Anybody aware of HP cluster issues with MPICH? Am I
> misinterpreting?
> > > Any
> > > help is appreciated.
> > >
> > > Gauri.
> > > ---------
> >


More information about the mpich-discuss mailing list