[mpich-discuss] HP-XC 3000 cluster issues

Gauri Kulkarni gaurivk at gmail.com
Wed Feb 25 06:03:33 CST 2009


Thanks, Rajeev.

Is it the same case with MPICH1? The reason I need info about MPICH1 with
SLURM is because we have a software (FastDL) which has been compiled with
MPICH1. We have asked the vendor to give us the software recompiled with
MPICH2, but honestly, we do not know any particular advantage using MPICH2
over MPICH1 (apart from the fact that MPICH1 isn't maintained anymore).

On a sidenote, how do I reply to the thread? I only get the daily digest in
mail.

-Gauri.
----------



-------------

Message: 4
Date: Mon, 23 Feb 2009 23:38:06 -0600
From: "Rajeev Thakur" <thakur at mcs.anl.gov>
Subject: Re: [mpich-discuss] HP-XC 3000 cluster issues
To: <mpich-discuss at mcs.anl.gov>
Message-ID: <72376B2D10EC43F9A0A433C960F951B6 at thakurlaptop>
Content-Type: text/plain; charset="us-ascii"

To run MPICH2 with SLURM, configure with the options "--with-pmi=slurm
--with-pm=no" as described in the MPICH2 README file. Also see the
instructions on how to run MPICH2 with SLURM at
https://computing.llnl.gov/linux/slurm/quickstart.html .

Rajeev



 _____

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gauri Kulkarni
Sent: Monday, February 23, 2009 11:19 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] HP-XC 3000 cluster issues


Hi,

I am a newbie to the MPI in general. Currently in our institute, we have a
cluster of 16nodes-8processors. It is an HP-XC 3000 cluster which basically
means, it's quite proprietary. It has its own MPI implementation - HP-MPI -
in which, the parallelization is managed by SLURM (Simple Linux Utility for
Resource Management). There is also a batch job scheduler - LSF (Load
Sharing Facility) which works in tandem with SLURM to parallelize the batch
jobs. We have installed both MPICH and MPICH2 and are testing it, but we are
running into compatibility issues. For a simple helloworld.c program:
1. For HPMPI: Compiled with mpicc of this implemetation and executed with
its mpirun: mpirun -np 4 helloworld works correctly. For batch scheduling,
we need to isse "bsub -n4 [other options] mpirun -srun helloworld" and it
runs fine too. "srun" is SLURM utility that parallelizes the jobs.
2. For MPICH and mPICH2: Again, compiled with mpicc of these respective
implemetations and executed with their own mpirun:
   i) mpirun -np 4 helloword : Works.
  ii) mpirun -np 15 helloworld: The parallelization is limited to just a
single node - that is 8 processes run first on 8 processors of a single node
and then remaining ones.
 iii) bsub -n4 [options] mpirun -srun hellowrold: Job terminated. srun
option not recognized.
  iv) bsub [options] mpirun -np 4 helloworld: Works
  V) bsub [options] mpirun -np 15 helloworld: (Same as iii)

Anybody aware of HP cluster issues with MPICH? Am I misinterpreting? Any
help is appreciated.

Gauri.
---------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090225/2a350576/attachment.htm>


More information about the mpich-discuss mailing list