[mpich-discuss] HP-XC 3000 cluster issues
Anthony Chan
chan at mcs.anl.gov
Tue Mar 3 11:26:31 CST 2009
Does your LSF setup support any interactive or debugging session ?
If so, try doing srun in the interactive session ?
A.Chan
----- "Gauri Kulkarni" <gaurivk at gmail.com> wrote:
> Please bear with me, it is a long query.
>
> I don't think those instructions are particularly useful to me (see
> Rajeev's
> reply below). First of all, I cannot use 'srun' from command line, I
> can
> only use it as an option to mpirun when I am submitting the job
> through LSF.
> What I mean is, when I use srun from command line, this is what I get
> (the
> command is from the script mentioned at the bottom of the webpage you
> provided, Rajeev):
>
> [So What?? ~]$ srun hostname -s | sort -u
> srun: error: Unable to allocate resources: No partition specified or
> system
> default partition
>
> But when I submit it through LSF, this is what I get:
> [So What?? ~]$ bsub -n4 -o srun.%J.out mpirun -srun hostname -s | sort
> -u
> Job <14474> is submitted to default queue <normal>.
>
> <output>
> Your job looked like:
>
> ------------------------------------------------------------
> # LSBATCH: User input
> mpirun -srun hostname -s
> ------------------------------------------------------------
>
> Successfully completed.
>
> Resource usage summary:
>
> CPU time : 0.14 sec.
> Max Memory : 2 MB
> Max Swap : 103 MB
>
>
> The output (if any) follows:
>
> n4
> n4
> n4
> n4
> </output>
>
> Now this is true when I am using HP-MPI. When I switch to MPICH1, the
> output
> is like this:
> [So What?? ~]$ bsub -n15 -o srun.%J.out mpirun -srun -np 15
> -machinefile
> mpd.hosts hostname
> Job <14479> is submitted to default queue <normal>.
>
> <output>
> Your job looked like:
>
> ------------------------------------------------------------
> # LSBATCH: User input
> mpirun -srun -np 15 -machinefile mpd.hosts hostname
> ------------------------------------------------------------
>
> Exited with exit code 1.
>
> Resource usage summary:
>
> CPU time : 0.12 sec.
> Max Memory : 2 MB
> Max Swap : 103 MB
>
>
> The output (if any) follows:
>
> Warning: Command line arguments for program should be given
> after the program name. Assuming that hostname is a
> command line argument for the program.
> Missing: program name
> Program -srun either does not exist, is not
> executable, or is an erroneous argument to mpirun.
> </output>
>
> The SLURM version that we are using here is:
> [So What?? ~]$ srun --version
> slurm 1.0.15
>
> That means, the patch that website mentions for SLURM and MPICH1
> combo
> doesn't apply here as it is for SLURM version 1.2.11 of higher.
>
> If I go to MPICH2 and use it through bsub, it obviously fails,
> probably
> because it wasn't configured with the options that Rajeev had
> suggested
> earlier.
>
> The problem boils down to this:
> 1. The cluster is NOT configured for users to access each node
> individually,
> it's forbidden. I cannot launch my tasks (including starting mpd) on
> any
> node different from the head node.
> 2. This is so done as to prevent users from ssh-ing to individual
> nodes and
> submitting jobs, thereby hogging resources. Users can only submit jobs
> to
> other nodes via LSF (i.e. when bsub [options] mpirun -srun
> ./executable is
> used).
> 3. Obviously, since only HP-MPI impelemtation allows mpirun to take
> srun
> option while used with bsub, only in that implemetation, can I get my
> programs to run on multiple nodes.
>
> So, it is not just MPICH+SLURM that I need, I also need help with
> MPICH+(LSF+SLURM).
>
> Hail your patience.
>
> Gauri.
> ---------
>
> Date: Wed, 25 Feb 2009 12:34:29 -0600
> From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> Subject: Re: [mpich-discuss] HP-XC 3000 cluster issues
> To: <mpich-discuss at mcs.anl.gov>
> Message-ID: <9273167066F94A0391E51E21D045C0FB at mcs.anl.gov>
> Content-Type: text/plain; charset="us-ascii"
>
> Gauri,
> For MPICH-1, the instructions at the bottom of
> https://computing.llnl.gov/linux/slurm/quickstart.html may be
> sufficient (I
> don't know).
>
> Rajeev
More information about the mpich-discuss
mailing list