Slow speed after changing from serial to parallel (with ex2f.F)

Ben Tay zonexo at gmail.com
Sat Apr 19 10:18:49 CDT 2008


Hi Satish,

1st of all, I forgot to inform u that I've changed the m and n to 800. I 
would like to see if the larger value can make the scaling better. If 
req, I can redo the test with m,n=600.

I can install MPICH but I don't think I can choose to run on a single 
machine using from 1 to 8 procs. In order to run the code, I usually 
have to use the command

bsub -o log -q linux64 ./a.out       for single procs

bsub -o log -q mcore_parallel -n $ -a mvapich mpirun.lsf ./a.out where 
$=no. of procs.       for multiple procs

After that, when the job is running, I'll be given the server which my 
job runs on e.g. atlas3-c10 (1 procs) or 2*atlas3-c10 + 2*atlas3-c12 (4 
procs) or 2*atlas3-c10 + 2*atlas3-c12 +2*atlas3-c11 + 2*atlas3-c13 (8 
procs). I was told that 2*atlas3-c10 doesn't mean that it is running on 
a dual core single cpu.

Btw, are you saying that I should 1st install the latest MPICH2 build 
with the option :

./configure --with-device=ch3:nemesis:newtcp -with-pm=gforker And then 
install PETSc with the MPICH2?

So after that do you know how to do what you've suggest for my servers? 
I don't really understand what you mean. May I supposed to run 4 jobs on 
1 quadcore? Or 1 job using 4 cores on 1 quadcore? Well, I do know that 
atlas3-c00 to c03 are the location of the quad cores. I can force to use 
them by

bsub -o log -q mcore_parallel -n $ -m quadcore -a mvapich mpirun.lsf ./a.out

Lastly, I make a mistake in the different times reported by the same 
compiler. Sorry abt that.

Thank you very much.



Satish Balay wrote:
> On Sat, 19 Apr 2008, Ben Tay wrote:
>
>   
>> Btw, I'm not able to try the latest mpich2 because I do not have the
>> administrator rights. I was told that some special configuration is
>> required.
>>     
>
> You don't need admin rights to install/use MPICH with the options I
> mentioned. I was sugesting just running in SMP mode on a single
> machine [from 1-8 procs on Quad-Core Intel Xeon X5355, to compare with
> my SMP runs] with:
>
> ./configure --with-device=ch3:nemesis:newtcp -with-pm=gforker
>
>   
>> Btw, should there be any different in speed whether I use mpiuni and
>> ifort or mpi and mpif90? I tried on ex2f (below) and there's only a
>> small difference. If there is a large difference (mpi being slower),
>> then it mean there's something wrong in the code?
>>     
>
> For one - you are not using MPIUNI. You are using
> --with-mpi-dir=/lsftmp/g0306332/mpich2. However - if compilers are the
> same & compiler options are the same, I would expect the same
> performance in both the cases. Do you get such different times for
> different runs of the same binary?
>
> MatMult 384 vs 423
>
> What if you run both of the binaries on the same machine? [as a single
> job?].
>
> If you are using pbs scheduler - sugest doing:
> - squb -I [to get interactive access to thenodes]
> - login to each node - to check no one else is using the scheduled nodes.
> - run multiple jobs during this single allocation for comparision.
>
> These are general tips to help you debug performance on your cluster.
>
> BTW: I get:
> ex2f-600-1p.log:MatMult             1192 1.0 9.7109e+00 1.0 3.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 11  0  0  0  14 11  0  0  0   397
>
> You get:
> log.1:MatMult             1879 1.0 2.8137e+01 1.0 3.84e+08 1.0 0.0e+00 0.0e+00 0.0e+00 12 11  0  0  0  12 11  0  0  0   384
>
>
> There is a difference in number of iterations. Are you sure you are
> using the same ex2f with -m 600 -n 600 options?
>
> Satish




More information about the petsc-users mailing list