[mpich-discuss] mpich2 does not work with SGE

Tue Jul 26 01:37:36 CDT 2011

Hi Reuti

Here is how we submitted the jobs through a defined SGE queue.

1. Submit the job using a job script (job_lammps.sh)
$ qsub -q molsim.q job_lammps.sh (In the previous message there was a typo
because I had mistakenly typed; $ qsub -q queue ./a.out<input>output )

The job script looks like this
----------------------------------------------------------
#!/bin/sh
# request Bourne shell as shell for job
#$ -S /bin/sh
# Name of the job
#$ -N mpich2_lammps_test
# Name of the output log file
#$ -o lammps_test.log
# Combine output/ error messages into one file
#$ -j y
# Use current working directory
#$ -cwd
# Specify the parallel environment (pe)
#$ -pe mpich2 8
# Commands to be executed
mpirun ./lmp_g++ < in.shear > thermo_shear
----------------------------------------------------------

2. Below is output for qstat and shows the job running on compute-0-5
corresponding to molsim.q
$ qstat -f

queuename                      qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q at compute-0-1.local        BIP   0/0/16         0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-10.local       BIP   0/0/16         0.09     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-2.local        BIP   0/0/16         0.03     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-3.local        BIP   0/0/16         0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-4.local        BIP   0/0/16         0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-5.local        BIP   0/0/16         0.02     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-6.local        BIP   0/0/16         0.01     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-7.local        BIP   0/0/16         0.02     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-8.local        BIP   0/0/16         0.04     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-9.local        BIP   0/0/16         0.04     lx26-amd64
---------------------------------------------------------------------------------
guru.q at compute-0-10.local      BIP   0/0/16         0.09     lx26-amd64
---------------------------------------------------------------------------------
guru.q at compute-0-7.local       BIP   0/0/16         0.02     lx26-amd64
---------------------------------------------------------------------------------
guru.q at compute-0-8.local       BIP   0/0/16         0.04     lx26-amd64
---------------------------------------------------------------------------------
guru.q at compute-0-9.local       BIP   0/0/16         0.04     lx26-amd64
---------------------------------------------------------------------------------
molsim.q at compute-0-1.local     BIP   0/0/16         0.00     lx26-amd64
---------------------------------------------------------------------------------
molsim.q at compute-0-2.local     BIP   0/0/16         0.03     lx26-amd64
---------------------------------------------------------------------------------
molsim.q at compute-0-3.local     BIP   0/0/16         0.00     lx26-amd64
---------------------------------------------------------------------------------
molsim.q at compute-0-4.local     BIP   0/0/16         0.00     lx26-amd64
---------------------------------------------------------------------------------
molsim.q at compute-0-5.local     BIP   0/8/16         0.02     lx26-amd64
    301 0.55500 mpich2_lam ajay         r     07/26/2011 19:40:11
8
---------------------------------------------------------------------------------
molsim.q at compute-0-6.local     BIP   0/0/16         0.01     lx26-amd64
---------------------------------------------------------------------------------
test_mpi.q at compute-0-10.local  BIP   0/0/8          0.09     lx26-amd64
---------------------------------------------------------------------------------
test_mpi.q at compute-0-9.local   BIP   0/0/8          0.04     lx26-amd64

3.  I log into compute-0-5 and do a ps -e f

compute-0-5 ~]$ ps -e f
It seems to show the jobs bound to sge_shephard

3901 ?        S      0:00  \_ hald-runner
 3909 ?        S      0:00      \_ hald-addon-acpi: listening on acpid
socket /v
 3915 ?        S      0:00      \_ hald-addon-keyboard: listening on
/dev/input/
 4058 ?        Ssl    0:00 automount
 4122 ?        Sl     0:00 /opt/gridengine/bin/lx26-amd64/sge_execd
 5466 ?        S      0:00  \_ sge_shepherd-301 -bg
 5467 ?        Ss     0:00      \_ -sh
/opt/gridengine/default/spool/execd/compute-0-5/job_scripts/301
 5609 ?        S      0:00          \_ mpirun ./lmp_g++
 5610 ?        S      0:00              \_
/opt/mpich2/gnu/bin//hydra_pmi_proxy --control-port compute-0-5.loc
 5611 ?        R      0:25                  \_ ./lmp_g++
 5612 ?        R      0:25                  \_ ./lmp_g++
 5613 ?        R      0:25                  \_ ./lmp_g++
 5614 ?        R      0:25                  \_ ./lmp_g++
 5615 ?        R      0:25                  \_ ./lmp_g++
 5616 ?        R      0:25                  \_ ./lmp_g++
 5617 ?        R      0:25                  \_ ./lmp_g++
 5618 ?        R      0:25                  \_ ./lmp_g++
 4143 ?        Sl     0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -p
/var/run/snmpd.
 4158 ?        Ss     0:00 /usr/sbin/sshd
 5619 ?        Ss     0:00  \_ sshd: ajay [priv]
 5621 ?        S      0:00      \_ sshd: ajay at pts/0
 5622 pts/0    Ss     0:00          \_ -bash
 5728 pts/0    R+     0:00              \_ ps -e f

The following shows that the mpirun is indeed pointing to the correct
location and the mpirun defined inside the
job script file is the right one.
5610 ?        S      0:00              \_
/opt/mpich2/gnu/bin//hydra_pmi_proxy --control-port compute-0-5.loc

4. However, the compute time is still slower  (159 seconds) than that for a
job run
through the command line using mpirun (42 seconds, mpiexec -f hostfile -np 8
./lmp_g++ < in.shear > thermo.shear).

I can't understand why there should be large difference between the plain
mpiexec and that started through the sge.

Thanks in advance

Regards
Tilak

>
> ------------------------------
>
> Message: 2
> Date: Mon, 25 Jul 2011 13:47:01 +0200
> From: Reuti <reuti at staff.uni-marburg.de>
> Subject: Re: [mpich-discuss] mpich2 does not work with SGE
> To: mpich-discuss at mcs.anl.gov
> Message-ID:
>        <6724EB30-A6E2-4172-A073-222820574652 at staff.uni-marburg.de>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> Am 25.07.2011 um 11:16 schrieb tilakraj dattaram:
>
> > Thanks a lot for all your help.
> >
> > Now we can run parallel jobs through the SGE (using a script file and
> > qsub). We submitted some test jobs and were keeping all the records
> > about the timings to compare the speeds with respect to mpirun
> > executed from the command line
> >
> > For some reason (which I can't find out) running jobs through SGE is
> > much slower than the command line. Is it so that command line works
> > faster then the SGE ?
> >
> > This is the comparison table,
> >
> > CASE 1
> > =======
> > mpiexec (both mpiexec and mpirun point to the same link, i.e.
> > mpiexec.hydra) from command line with a hostfile
> >
> > # mpiexec -f hostfile -np = n ./a.out<input
> >
> ..................................................................................
> > 16 cores per node
> >
> > N cpus                   time (in secs)               speedup
> >
> >    1                             262.27                         1
> >    2                             161.34                         1.63
> >    4                               82.41                         3.18
> >    8                               42.45                         6.18
> >   16                              33.56                         7.82
> >   24                              25.33                         10.35
> >   32                              20.38                         12.87
> >
> ..................................................................................
> >
> > CASE 2
> > =======
> > mpich2 integrated with SGE
> >
> > mpirun executed from within the job script
> >
> > This is our PE
> >
> > pe_name            mpich2
> > slots              999
> > user_lists         NONE
> > xuser_lists        NONE
> > start_proc_args    NONE
> > stop_proc_args     NONE
> > allocation_rule    $fill_up
> > control_slaves     TRUE
> > job_is_first_task  FALSE
> > urgency_slots      min
> > accounting_summary FALSE
> >
> > # qsub -q queue ./a.out<input>output (the PE is defined inside the
> > script file using the following option, #$ -pe mpich2 n)
> >
> .................................................................................
> > qsub [ 16 cores per node ]
> >
> > N cpus                   time (in secs)                 speedup
> >
> >   1                             383.5                             1
> >   2                             205.1                             1.87
> >   4                             174.3                             2.2
> >   8                             159.3                             2.4
> >  16                            123.2                             3.1
> >  24                            136.8                             2.8
> >  32                            124.6                             3.1
> >
> .................................................................................
> >
> >
> > As you can notice, the speed up is only about 3-times for a job run on
> > 16 processors and submitted through the SGE interface, whereas it is
> > nearly 8-fold when the parallel jobs are submitted from the command
> > line using mpirun. Another thing to note is that speed nearly
> > saturates around 3 for SGE, whereas it keeps increasing to around 13
> > for 32 processors for command line execution. In fact, we had earlier
> > found that the speed up keeps increasing till about 144 processors,
> > where it gives a maximum speed up of about 20-fold over the serial
> > job.
>
> no, this shouldn't be. There might be a small delay for the very first
> startup as SGE will be used by it's integrated `qrsh` startup instead of a
> plain `ssh`, but this shouldn't create such a huge difference. Once it's
> started, the usual communication inside MPICH2 will be used and there
> shouldn't be any difference noticeable.
>
> Can you please check on the various nodes, whether the allocation of the
> processes of your job is correct:
>
> $ ps -e f
>
> (f w/o -) and all are bound to the sge_shepherd on the intended nodes and
> no node is overloaded? Did you define a special queue for this (in SGE it's
> not necessary but possible depending on your personal taste), and limtit the
> available cores per node across all queues to avoid oversubscription?
>
> -- Reuti
>
>
> > Your help will be greatly appreciated!
> >
> > Thank you
> >
> > Regards
> > Tilak
> >
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> End of mpich-discuss Digest, Vol 34, Issue 32
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110726/189d7e53/attachment-0001.htm>