Hi Reuti<br><br>Here is how we submitted the jobs through a defined SGE queue. <br><br>1. Submit the job using a job script (job_lammps.sh)<br>$ qsub -q molsim.q job_lammps.sh (In the previous message there was a typo <br>

because I had mistakenly typed; $ qsub -q queue ./a.out&lt;input&gt;output )<br><br>The job script looks like this<br>----------------------------------------------------------<br>#!/bin/sh<br># request Bourne shell as shell for job<br>

#$ -S /bin/sh<br># Name of the job<br>#$ -N mpich2_lammps_test<br># Name of the output log file<br>#$ -o lammps_test.log <br># Combine output/ error messages into one file<br>#$ -j y<br># Use current working directory<br>

#$ -cwd<br># Specify the parallel environment (pe)<br>#$ -pe mpich2 8<br># Commands to be executed<br>mpirun ./lmp_g++ &lt; in.shear &gt; thermo_shear<br>----------------------------------------------------------<br><br>2. Below is output for qstat and shows the job running on compute-0-5 corresponding to molsim.q<br>

$ qstat -f<br><br>queuename                      qtype resv/used/tot. load_avg arch          states<br>---------------------------------------------------------------------------------<br>all.q@compute-0-1.local        BIP   0/0/16         0.00     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>all.q@compute-0-10.local       BIP   0/0/16         0.09     lx26-amd64    <br>---------------------------------------------------------------------------------<br>

all.q@compute-0-2.local        BIP   0/0/16         0.03     lx26-amd64    <br>---------------------------------------------------------------------------------<br>all.q@compute-0-3.local        BIP   0/0/16         0.00     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>all.q@compute-0-4.local        BIP   0/0/16         0.00     lx26-amd64    <br>---------------------------------------------------------------------------------<br>

all.q@compute-0-5.local        BIP   0/0/16         0.02     lx26-amd64    <br>---------------------------------------------------------------------------------<br>all.q@compute-0-6.local        BIP   0/0/16         0.01     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>all.q@compute-0-7.local        BIP   0/0/16         0.02     lx26-amd64    <br>---------------------------------------------------------------------------------<br>

all.q@compute-0-8.local        BIP   0/0/16         0.04     lx26-amd64    <br>---------------------------------------------------------------------------------<br>all.q@compute-0-9.local        BIP   0/0/16         0.04     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>guru.q@compute-0-10.local      BIP   0/0/16         0.09     lx26-amd64    <br>---------------------------------------------------------------------------------<br>

guru.q@compute-0-7.local       BIP   0/0/16         0.02     lx26-amd64    <br>---------------------------------------------------------------------------------<br>guru.q@compute-0-8.local       BIP   0/0/16         0.04     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>guru.q@compute-0-9.local       BIP   0/0/16         0.04     lx26-amd64    <br>---------------------------------------------------------------------------------<br>

molsim.q@compute-0-1.local     BIP   0/0/16         0.00     lx26-amd64    <br>---------------------------------------------------------------------------------<br>molsim.q@compute-0-2.local     BIP   0/0/16         0.03     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>molsim.q@compute-0-3.local     BIP   0/0/16         0.00     lx26-amd64    <br>---------------------------------------------------------------------------------<br>

molsim.q@compute-0-4.local     BIP   0/0/16         0.00     lx26-amd64    <br>---------------------------------------------------------------------------------<br><span style="background-color: rgb(255, 255, 102);">molsim.q@compute-0-5.local     BIP   0/8/16         0.02     lx26-amd64    </span><br style="background-color: rgb(255, 255, 102);">

<span style="background-color: rgb(255, 255, 102);">    301 0.55500 mpich2_lam ajay         r     07/26/2011 19:40:11     8        </span><br>---------------------------------------------------------------------------------<br>

molsim.q@compute-0-6.local     BIP   0/0/16         0.01     lx26-amd64    <br>---------------------------------------------------------------------------------<br>test_mpi.q@compute-0-10.local  BIP   0/0/8          0.09     lx26-amd64    <br>

---------------------------------------------------------------------------------<br>test_mpi.q@compute-0-9.local   BIP   0/0/8          0.04     lx26-amd64    <br><br>3.  I log into compute-0-5 and do a ps -e f<br><br>compute-0-5 ~]$ ps -e f<br>

It seems to show the jobs bound to sge_shephard<br><br>3901 ?        S      0:00  \_ hald-runner<br> 3909 ?        S      0:00      \_ hald-addon-acpi: listening on acpid socket /v<br> 3915 ?        S      0:00      \_ hald-addon-keyboard: listening on /dev/input/<br>

 4058 ?        Ssl    0:00 automount<br> 4122 ?        Sl     0:00 /opt/gridengine/bin/lx26-amd64/sge_execd<br> 5466 ?        S      0:00  \_ sge_shepherd-301 -bg<br> 5467 ?        Ss     0:00      \_ -sh /opt/gridengine/default/spool/execd/compute-0-5/job_scripts/301<br>

 5609 ?        S      0:00          \_ mpirun ./lmp_g++<br> 5610 ?        S      0:00              \_ /opt/mpich2/gnu/bin//hydra_pmi_proxy --control-port compute-0-5.loc<br> 5611 ?        R      0:25                  \_ ./lmp_g++<br>

 5612 ?        R      0:25                  \_ ./lmp_g++<br> 5613 ?        R      0:25                  \_ ./lmp_g++<br> 5614 ?        R      0:25                  \_ ./lmp_g++<br> 5615 ?        R      0:25                  \_ ./lmp_g++<br>

 5616 ?        R      0:25                  \_ ./lmp_g++<br> 5617 ?        R      0:25                  \_ ./lmp_g++<br> 5618 ?        R      0:25                  \_ ./lmp_g++<br> 4143 ?        Sl     0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd.<br>

 4158 ?        Ss     0:00 /usr/sbin/sshd<br> 5619 ?        Ss     0:00  \_ sshd: ajay [priv]<br> 5621 ?        S      0:00      \_ sshd: ajay@pts/0 <br> 5622 pts/0    Ss     0:00          \_ -bash<br> 5728 pts/0    R+     0:00              \_ ps -e f<br>

<br>The following shows that the mpirun is indeed pointing to the correct location and the mpirun defined inside the<br>job script file is the right one. <br>5610 ?        S      0:00             <span style="background-color: rgb(255, 255, 102);"> \_ /opt/mpich2/gnu/bin//hydra_pmi_proxy</span> --control-port compute-0-5.loc<br>

<br>4. However, the compute time is still slower  (159 seconds) than that for a job run <br>through the command line using mpirun (42 seconds, mpiexec -f hostfile -np 8 ./lmp_g++ &lt; in.shear &gt; thermo.shear).<br><br>I can&#39;t understand why there should be large difference between the plain mpiexec and that started through the sge. <br>

<br>Thanks in advance<br><br>Regards<br>Tilak<br><br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Mon, 25 Jul 2011 13:47:01 +0200<br>

From: Reuti &lt;<a href="mailto:reuti@staff.uni-marburg.de">reuti@staff.uni-marburg.de</a>&gt;<br>

Subject: Re: [mpich-discuss] mpich2 does not work with SGE<br>

To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

Message-ID:<br>

        &lt;<a href="mailto:6724EB30-A6E2-4172-A073-222820574652@staff.uni-marburg.de">6724EB30-A6E2-4172-A073-222820574652@staff.uni-marburg.de</a>&gt;<br>

Content-Type: text/plain; charset=us-ascii<br>

<br>

Hi,<br>

<br>

Am 25.07.2011 um 11:16 schrieb tilakraj dattaram:<br>

<br>

&gt; Thanks a lot for all your help.<br>

&gt;<br>

&gt; Now we can run parallel jobs through the SGE (using a script file and<br>

&gt; qsub). We submitted some test jobs and were keeping all the records<br>

&gt; about the timings to compare the speeds with respect to mpirun<br>

&gt; executed from the command line<br>

&gt;<br>

&gt; For some reason (which I can&#39;t find out) running jobs through SGE is<br>

&gt; much slower than the command line. Is it so that command line works<br>

&gt; faster then the SGE ?<br>

&gt;<br>

&gt; This is the comparison table,<br>

&gt;<br>

&gt; CASE 1<br>

&gt; =======<br>

&gt; mpiexec (both mpiexec and mpirun point to the same link, i.e.<br>

&gt; mpiexec.hydra) from command line with a hostfile<br>

&gt;<br>

&gt; # mpiexec -f hostfile -np = n ./a.out&lt;input<br>

&gt; ..................................................................................<br>

&gt; 16 cores per node<br>

&gt;<br>

&gt; N cpus                   time (in secs)               speedup<br>

&gt;<br>

&gt;    1                             262.27                         1<br>

&gt;    2                             161.34                         1.63<br>

&gt;    4                               82.41                         3.18<br>

&gt;    8                               42.45                         6.18<br>

&gt;   16                              33.56                         7.82<br>

&gt;   24                              25.33                         10.35<br>

&gt;   32                              20.38                         12.87<br>

&gt; ..................................................................................<br>

&gt;<br>

&gt; CASE 2<br>

&gt; =======<br>

&gt; mpich2 integrated with SGE<br>

&gt;<br>

&gt; mpirun executed from within the job script<br>

&gt;<br>

&gt; This is our PE<br>

&gt;<br>

&gt; pe_name            mpich2<br>

&gt; slots              999<br>

&gt; user_lists         NONE<br>

&gt; xuser_lists        NONE<br>

&gt; start_proc_args    NONE<br>

&gt; stop_proc_args     NONE<br>

&gt; allocation_rule    $fill_up<br>

&gt; control_slaves     TRUE<br>

&gt; job_is_first_task  FALSE<br>

&gt; urgency_slots      min<br>

&gt; accounting_summary FALSE<br>

&gt;<br>

&gt; # qsub -q queue ./a.out&lt;input&gt;output (the PE is defined inside the<br>

&gt; script file using the following option, #$ -pe mpich2 n)<br>

&gt; .................................................................................<br>

&gt; qsub [ 16 cores per node ]<br>

&gt;<br>

&gt; N cpus                   time (in secs)                 speedup<br>

&gt;<br>

&gt;   1                             383.5                             1<br>

&gt;   2                             205.1                             1.87<br>

&gt;   4                             174.3                             2.2<br>

&gt;   8                             159.3                             2.4<br>

&gt;  16                            123.2                             3.1<br>

&gt;  24                            136.8                             2.8<br>

&gt;  32                            124.6                             3.1<br>

&gt; .................................................................................<br>

&gt;<br>

&gt;<br>

&gt; As you can notice, the speed up is only about 3-times for a job run on<br>

&gt; 16 processors and submitted through the SGE interface, whereas it is<br>

&gt; nearly 8-fold when the parallel jobs are submitted from the command<br>

&gt; line using mpirun. Another thing to note is that speed nearly<br>

&gt; saturates around 3 for SGE, whereas it keeps increasing to around 13<br>

&gt; for 32 processors for command line execution. In fact, we had earlier<br>

&gt; found that the speed up keeps increasing till about 144 processors,<br>

&gt; where it gives a maximum speed up of about 20-fold over the serial<br>

&gt; job.<br>

<br>

no, this shouldn&#39;t be. There might be a small delay for the very first startup as SGE will be used by it&#39;s integrated `qrsh` startup instead of a plain `ssh`, but this shouldn&#39;t create such a huge difference. Once it&#39;s started, the usual communication inside MPICH2 will be used and there shouldn&#39;t be any difference noticeable.<br>


<br>

Can you please check on the various nodes, whether the allocation of the processes of your job is correct:<br>

<br>

$ ps -e f<br>

<br>

(f w/o -) and all are bound to the sge_shepherd on the intended nodes and no node is overloaded? Did you define a special queue for this (in SGE it&#39;s not necessary but possible depending on your personal taste), and limtit the available cores per node across all queues to avoid oversubscription?<br>


<br>

-- Reuti<br>

<br>

<br>

&gt; Your help will be greatly appreciated!<br>

&gt;<br>

&gt; Thank you<br>

&gt;<br>

&gt; Regards<br>

&gt; Tilak<br>

&gt;<br>

<br>

_______________________________________________<br>

mpich-discuss mailing list<br>

<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>

<br>

<br>

End of mpich-discuss Digest, Vol 34, Issue 32<br>

*********************************************<br>

</blockquote><br></div><br>