[mpich-discuss] HP-XC 3000 cluster issues

Dave Goodell goodell at mcs.anl.gov
Wed Mar 4 10:57:11 CST 2009


Gauri,

Do you know where your slurm headers and libraries are located?  You  
can specify a root for the slurm installation via the "--with-slurm=/ 
path/to/slurm/prefix" option to configure.

For example, if you have the following files:

/foo/bar/baz/lib/libpmi.a
/foo/bar/baz/include/slurm/pmi.h

Then pass "--with-slurm=/foo/bar/baz" to configure.  If "/foo/bar/baz"  
is "/usr" or "/" then this should have worked without the "--with- 
slurm" option.  Almost any other prefix will require this option.

If you have a nonstandard layout for your slurm installation there are  
other configure arguments you can pass to make everything work too.   
But let's hold off on discussing those until we know that you need it.

-Dave

On Mar 4, 2009, at 6:40 AM, Gauri Kulkarni wrote:

> Ok, I have tried to recompile MPICH2 with following options. I  
> cannot recompile the 'global version', so I have tried to install it  
> in my home dir and would update the PATH accordingly. But compiling  
> is failing at the 'configure' step with following error:
>
> command: ./configure --prefix=/data1/visitor/cgaurik/mympi/ --with- 
> pmi=slurm --with-pm=no
> End part of the output:
> RUNNING CONFIGURE FOR THE SLURM PMI
> checking for make... make
> checking whether clock skew breaks make... no
> checking whether make supports include... yes
> checking whether make allows comments in actions... yes
> checking for virtual path format... VPATH
> checking whether make sets CFLAGS... yes
> checking for gcc... gcc
> checking for C compiler default output file name... a.out
> checking whether the C compiler works... yes
> checking whether we are cross compiling... no
> checking for suffix of executables...
> checking for suffix of object files... o
> checking whether we are using the GNU C compiler... yes
> checking whether gcc accepts -g... yes
> checking for gcc option to accept ANSI C... none needed
> checking how to run the C preprocessor... gcc -E
> checking for slurm/pmi.h... no
> configure: error: could not find slurm/pmi.h.  Configure aborted
> configure: error: Configure of src/pmi/slurm failed!
>
>
> Gauri.
> ---------
>
>
>
> > > > Message: 4
> > > > Date: Mon, 23 Feb 2009 23:38:06 -0600
> > > > From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> > > > Subject: Re: [mpich-discuss] HP-XC 3000 cluster issues
> > > > To: <mpich-discuss at mcs.anl.gov>
> > > > Message-ID: <72376B2D10EC43F9A0A433C960F951B6 at thakurlaptop>
> > > > Content-Type: text/plain; charset="us-ascii"
> > > >
> > > > To run MPICH2 with SLURM, configure with the options
> > > > "--with-pmi=slurm
> > > > --with-pm=no" as described in the MPICH2 README file. Also see
> > the
> > > > instructions on how to run MPICH2 with SLURM at
> > > > https://computing.llnl.gov/linux/slurm/quickstart.html .
> > > >
> > > > Rajeev
> > > >
> > > >
> > > >
> > > >  _____
> > > >
> > > > From: mpich-discuss-bounces at mcs.anl.gov
> > > > [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gauri
> > > > Kulkarni
> > > > Sent: Monday, February 23, 2009 11:19 PM
> > > > To: mpich-discuss at mcs.anl.gov
> > > > Subject: [mpich-discuss] HP-XC 3000 cluster issues
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I am a newbie to the MPI in general. Currently in our institute,
> > we
> > > > have a
> > > > cluster of 16nodes-8processors. It is an HP-XC 3000 cluster  
> which
> > > > basically
> > > > means, it's quite proprietary. It has its own MPI implementation
> > -
> > > > HP-MPI -
> > > > in which, the parallelization is managed by SLURM (Simple Linux
> > > > Utility for
> > > > Resource Management). There is also a batch job scheduler - LSF
> > (Load
> > > > Sharing Facility) which works in tandem with SLURM to  
> parallelize
> > the
> > > > batch
> > > > jobs. We have installed both MPICH and MPICH2 and are testing  
> it,
> > but
> > > > we are
> > > > running into compatibility issues. For a simple helloworld.c
> > program:
> > > > 1. For HPMPI: Compiled with mpicc of this implemetation and
> > executed
> > > > with
> > > > its mpirun: mpirun -np 4 helloworld works correctly. For batch
> > > > scheduling,
> > > > we need to isse "bsub -n4 [other options] mpirun -srun  
> helloworld"
> > and
> > > > it
> > > > runs fine too. "srun" is SLURM utility that parallelizes the
> > jobs.
> > > > 2. For MPICH and mPICH2: Again, compiled with mpicc of these
> > > > respective
> > > > implemetations and executed with their own mpirun:
> > > >    i) mpirun -np 4 helloword : Works.
> > > >   ii) mpirun -np 15 helloworld: The parallelization is limited  
> to
> > just
> > > > a
> > > > single node - that is 8 processes run first on 8 processors of a
> > > > single node
> > > > and then remaining ones.
> > > >  iii) bsub -n4 [options] mpirun -srun hellowrold: Job  
> terminated.
> > > > srun
> > > > option not recognized.
> > > >   iv) bsub [options] mpirun -np 4 helloworld: Works
> > > >   V) bsub [options] mpirun -np 15 helloworld: (Same as iii)
> > > >
> > > > Anybody aware of HP cluster issues with MPICH? Am I
> > misinterpreting?
> > > > Any
> > > > help is appreciated.
> > > >
> > > > Gauri.
> > > > ---------
> > >
>



More information about the mpich-discuss mailing list