[mpich-discuss] HP-XC 3000 cluster issues

Anthony Chan chan at mcs.anl.gov
Tue Mar 3 11:07:23 CST 2009


One advantage of MPICH2 over MPICH1 is that MPICH2 is a lot more robust
in term of process management.  So debugging MPICH2 app with MPICH1 is
easier.  Also, MPICH1 is no longer being developed, if you have any prolem
with MPICH1, less people (if there is any) will be able to help you.

A.Chan

----- "Gauri Kulkarni" <gaurivk at gmail.com> wrote:

> Thanks, Rajeev.
> 
> Is it the same case with MPICH1? The reason I need info about MPICH1
> with
> SLURM is because we have a software (FastDL) which has been compiled
> with
> MPICH1. We have asked the vendor to give us the software recompiled
> with
> MPICH2, but honestly, we do not know any particular advantage using
> MPICH2
> over MPICH1 (apart from the fact that MPICH1 isn't maintained
> anymore).
> 
> On a sidenote, how do I reply to the thread? I only get the daily
> digest in
> mail.
> 
> -Gauri.
> ----------
> 
> 
> 
> -------------
> 
> Message: 4
> Date: Mon, 23 Feb 2009 23:38:06 -0600
> From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> Subject: Re: [mpich-discuss] HP-XC 3000 cluster issues
> To: <mpich-discuss at mcs.anl.gov>
> Message-ID: <72376B2D10EC43F9A0A433C960F951B6 at thakurlaptop>
> Content-Type: text/plain; charset="us-ascii"
> 
> To run MPICH2 with SLURM, configure with the options
> "--with-pmi=slurm
> --with-pm=no" as described in the MPICH2 README file. Also see the
> instructions on how to run MPICH2 with SLURM at
> https://computing.llnl.gov/linux/slurm/quickstart.html .
> 
> Rajeev
> 
> 
> 
>  _____
> 
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gauri
> Kulkarni
> Sent: Monday, February 23, 2009 11:19 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] HP-XC 3000 cluster issues
> 
> 
> Hi,
> 
> I am a newbie to the MPI in general. Currently in our institute, we
> have a
> cluster of 16nodes-8processors. It is an HP-XC 3000 cluster which
> basically
> means, it's quite proprietary. It has its own MPI implementation -
> HP-MPI -
> in which, the parallelization is managed by SLURM (Simple Linux
> Utility for
> Resource Management). There is also a batch job scheduler - LSF (Load
> Sharing Facility) which works in tandem with SLURM to parallelize the
> batch
> jobs. We have installed both MPICH and MPICH2 and are testing it, but
> we are
> running into compatibility issues. For a simple helloworld.c program:
> 1. For HPMPI: Compiled with mpicc of this implemetation and executed
> with
> its mpirun: mpirun -np 4 helloworld works correctly. For batch
> scheduling,
> we need to isse "bsub -n4 [other options] mpirun -srun helloworld" and
> it
> runs fine too. "srun" is SLURM utility that parallelizes the jobs.
> 2. For MPICH and mPICH2: Again, compiled with mpicc of these
> respective
> implemetations and executed with their own mpirun:
>    i) mpirun -np 4 helloword : Works.
>   ii) mpirun -np 15 helloworld: The parallelization is limited to just
> a
> single node - that is 8 processes run first on 8 processors of a
> single node
> and then remaining ones.
>  iii) bsub -n4 [options] mpirun -srun hellowrold: Job terminated.
> srun
> option not recognized.
>   iv) bsub [options] mpirun -np 4 helloworld: Works
>   V) bsub [options] mpirun -np 15 helloworld: (Same as iii)
> 
> Anybody aware of HP cluster issues with MPICH? Am I misinterpreting?
> Any
> help is appreciated.
> 
> Gauri.
> ---------


More information about the mpich-discuss mailing list