Hi Reuti<br><br>Here is how we submitted the jobs through a defined SGE queue. <br><br>1. Submit the job using a job script (job_lammps.sh)<br>$ qsub -q molsim.q job_lammps.sh (In the previous message there was a typo <br>
because I had mistakenly typed; $ qsub -q queue ./a.out<input>output )<br><br>The job script looks like this<br>----------------------------------------------------------<br>#!/bin/sh<br># request Bourne shell as shell for job<br>
#$ -S /bin/sh<br># Name of the job<br>#$ -N mpich2_lammps_test<br># Name of the output log file<br>#$ -o lammps_test.log <br># Combine output/ error messages into one file<br>#$ -j y<br># Use current working directory<br>
#$ -cwd<br># Specify the parallel environment (pe)<br>#$ -pe mpich2 8<br># Commands to be executed<br>mpirun ./lmp_g++ < in.shear > thermo_shear<br>----------------------------------------------------------<br><br>2. Below is output for qstat and shows the job running on compute-0-5 corresponding to molsim.q<br>
$ qstat -f<br><br>queuename qtype resv/used/tot. load_avg arch states<br>---------------------------------------------------------------------------------<br>all.q@compute-0-1.local BIP 0/0/16 0.00 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>all.q@compute-0-10.local BIP 0/0/16 0.09 lx26-amd64 <br>---------------------------------------------------------------------------------<br>
all.q@compute-0-2.local BIP 0/0/16 0.03 lx26-amd64 <br>---------------------------------------------------------------------------------<br>all.q@compute-0-3.local BIP 0/0/16 0.00 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>all.q@compute-0-4.local BIP 0/0/16 0.00 lx26-amd64 <br>---------------------------------------------------------------------------------<br>
all.q@compute-0-5.local BIP 0/0/16 0.02 lx26-amd64 <br>---------------------------------------------------------------------------------<br>all.q@compute-0-6.local BIP 0/0/16 0.01 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>all.q@compute-0-7.local BIP 0/0/16 0.02 lx26-amd64 <br>---------------------------------------------------------------------------------<br>
all.q@compute-0-8.local BIP 0/0/16 0.04 lx26-amd64 <br>---------------------------------------------------------------------------------<br>all.q@compute-0-9.local BIP 0/0/16 0.04 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>guru.q@compute-0-10.local BIP 0/0/16 0.09 lx26-amd64 <br>---------------------------------------------------------------------------------<br>
guru.q@compute-0-7.local BIP 0/0/16 0.02 lx26-amd64 <br>---------------------------------------------------------------------------------<br>guru.q@compute-0-8.local BIP 0/0/16 0.04 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>guru.q@compute-0-9.local BIP 0/0/16 0.04 lx26-amd64 <br>---------------------------------------------------------------------------------<br>
molsim.q@compute-0-1.local BIP 0/0/16 0.00 lx26-amd64 <br>---------------------------------------------------------------------------------<br>molsim.q@compute-0-2.local BIP 0/0/16 0.03 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>molsim.q@compute-0-3.local BIP 0/0/16 0.00 lx26-amd64 <br>---------------------------------------------------------------------------------<br>
molsim.q@compute-0-4.local BIP 0/0/16 0.00 lx26-amd64 <br>---------------------------------------------------------------------------------<br><span style="background-color: rgb(255, 255, 102);">molsim.q@compute-0-5.local BIP 0/8/16 0.02 lx26-amd64 </span><br style="background-color: rgb(255, 255, 102);">
<span style="background-color: rgb(255, 255, 102);"> 301 0.55500 mpich2_lam ajay r 07/26/2011 19:40:11 8 </span><br>---------------------------------------------------------------------------------<br>
molsim.q@compute-0-6.local BIP 0/0/16 0.01 lx26-amd64 <br>---------------------------------------------------------------------------------<br>test_mpi.q@compute-0-10.local BIP 0/0/8 0.09 lx26-amd64 <br>
---------------------------------------------------------------------------------<br>test_mpi.q@compute-0-9.local BIP 0/0/8 0.04 lx26-amd64 <br><br>3. I log into compute-0-5 and do a ps -e f<br><br>compute-0-5 ~]$ ps -e f<br>
It seems to show the jobs bound to sge_shephard<br><br>3901 ? S 0:00 \_ hald-runner<br> 3909 ? S 0:00 \_ hald-addon-acpi: listening on acpid socket /v<br> 3915 ? S 0:00 \_ hald-addon-keyboard: listening on /dev/input/<br>
4058 ? Ssl 0:00 automount<br> 4122 ? Sl 0:00 /opt/gridengine/bin/lx26-amd64/sge_execd<br> 5466 ? S 0:00 \_ sge_shepherd-301 -bg<br> 5467 ? Ss 0:00 \_ -sh /opt/gridengine/default/spool/execd/compute-0-5/job_scripts/301<br>
5609 ? S 0:00 \_ mpirun ./lmp_g++<br> 5610 ? S 0:00 \_ /opt/mpich2/gnu/bin//hydra_pmi_proxy --control-port compute-0-5.loc<br> 5611 ? R 0:25 \_ ./lmp_g++<br>
5612 ? R 0:25 \_ ./lmp_g++<br> 5613 ? R 0:25 \_ ./lmp_g++<br> 5614 ? R 0:25 \_ ./lmp_g++<br> 5615 ? R 0:25 \_ ./lmp_g++<br>
5616 ? R 0:25 \_ ./lmp_g++<br> 5617 ? R 0:25 \_ ./lmp_g++<br> 5618 ? R 0:25 \_ ./lmp_g++<br> 4143 ? Sl 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd.<br>
4158 ? Ss 0:00 /usr/sbin/sshd<br> 5619 ? Ss 0:00 \_ sshd: ajay [priv]<br> 5621 ? S 0:00 \_ sshd: ajay@pts/0 <br> 5622 pts/0 Ss 0:00 \_ -bash<br> 5728 pts/0 R+ 0:00 \_ ps -e f<br>
<br>The following shows that the mpirun is indeed pointing to the correct location and the mpirun defined inside the<br>job script file is the right one. <br>5610 ? S 0:00 <span style="background-color: rgb(255, 255, 102);"> \_ /opt/mpich2/gnu/bin//hydra_pmi_proxy</span> --control-port compute-0-5.loc<br>
<br>4. However, the compute time is still slower (159 seconds) than that for a job run <br>through the command line using mpirun (42 seconds, mpiexec -f hostfile -np 8 ./lmp_g++ < in.shear > thermo.shear).<br><br>I can't understand why there should be large difference between the plain mpiexec and that started through the sge. <br>
<br>Thanks in advance<br><br>Regards<br>Tilak<br><br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Mon, 25 Jul 2011 13:47:01 +0200<br>
From: Reuti <<a href="mailto:reuti@staff.uni-marburg.de">reuti@staff.uni-marburg.de</a>><br>
Subject: Re: [mpich-discuss] mpich2 does not work with SGE<br>
To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
Message-ID:<br>
<<a href="mailto:6724EB30-A6E2-4172-A073-222820574652@staff.uni-marburg.de">6724EB30-A6E2-4172-A073-222820574652@staff.uni-marburg.de</a>><br>
Content-Type: text/plain; charset=us-ascii<br>
<br>
Hi,<br>
<br>
Am 25.07.2011 um 11:16 schrieb tilakraj dattaram:<br>
<br>
> Thanks a lot for all your help.<br>
><br>
> Now we can run parallel jobs through the SGE (using a script file and<br>
> qsub). We submitted some test jobs and were keeping all the records<br>
> about the timings to compare the speeds with respect to mpirun<br>
> executed from the command line<br>
><br>
> For some reason (which I can't find out) running jobs through SGE is<br>
> much slower than the command line. Is it so that command line works<br>
> faster then the SGE ?<br>
><br>
> This is the comparison table,<br>
><br>
> CASE 1<br>
> =======<br>
> mpiexec (both mpiexec and mpirun point to the same link, i.e.<br>
> mpiexec.hydra) from command line with a hostfile<br>
><br>
> # mpiexec -f hostfile -np = n ./a.out<input<br>
> ..................................................................................<br>
> 16 cores per node<br>
><br>
> N cpus time (in secs) speedup<br>
><br>
> 1 262.27 1<br>
> 2 161.34 1.63<br>
> 4 82.41 3.18<br>
> 8 42.45 6.18<br>
> 16 33.56 7.82<br>
> 24 25.33 10.35<br>
> 32 20.38 12.87<br>
> ..................................................................................<br>
><br>
> CASE 2<br>
> =======<br>
> mpich2 integrated with SGE<br>
><br>
> mpirun executed from within the job script<br>
><br>
> This is our PE<br>
><br>
> pe_name mpich2<br>
> slots 999<br>
> user_lists NONE<br>
> xuser_lists NONE<br>
> start_proc_args NONE<br>
> stop_proc_args NONE<br>
> allocation_rule $fill_up<br>
> control_slaves TRUE<br>
> job_is_first_task FALSE<br>
> urgency_slots min<br>
> accounting_summary FALSE<br>
><br>
> # qsub -q queue ./a.out<input>output (the PE is defined inside the<br>
> script file using the following option, #$ -pe mpich2 n)<br>
> .................................................................................<br>
> qsub [ 16 cores per node ]<br>
><br>
> N cpus time (in secs) speedup<br>
><br>
> 1 383.5 1<br>
> 2 205.1 1.87<br>
> 4 174.3 2.2<br>
> 8 159.3 2.4<br>
> 16 123.2 3.1<br>
> 24 136.8 2.8<br>
> 32 124.6 3.1<br>
> .................................................................................<br>
><br>
><br>
> As you can notice, the speed up is only about 3-times for a job run on<br>
> 16 processors and submitted through the SGE interface, whereas it is<br>
> nearly 8-fold when the parallel jobs are submitted from the command<br>
> line using mpirun. Another thing to note is that speed nearly<br>
> saturates around 3 for SGE, whereas it keeps increasing to around 13<br>
> for 32 processors for command line execution. In fact, we had earlier<br>
> found that the speed up keeps increasing till about 144 processors,<br>
> where it gives a maximum speed up of about 20-fold over the serial<br>
> job.<br>
<br>
no, this shouldn't be. There might be a small delay for the very first startup as SGE will be used by it's integrated `qrsh` startup instead of a plain `ssh`, but this shouldn't create such a huge difference. Once it's started, the usual communication inside MPICH2 will be used and there shouldn't be any difference noticeable.<br>
<br>
Can you please check on the various nodes, whether the allocation of the processes of your job is correct:<br>
<br>
$ ps -e f<br>
<br>
(f w/o -) and all are bound to the sge_shepherd on the intended nodes and no node is overloaded? Did you define a special queue for this (in SGE it's not necessary but possible depending on your personal taste), and limtit the available cores per node across all queues to avoid oversubscription?<br>
<br>
-- Reuti<br>
<br>
<br>
> Your help will be greatly appreciated!<br>
><br>
> Thank you<br>
><br>
> Regards<br>
> Tilak<br>
><br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
<br>
End of mpich-discuss Digest, Vol 34, Issue 32<br>
*********************************************<br>
</blockquote><br></div><br>