Slow speed after changing from serial to parallel (with ex2f.F)
Ben Tay
zonexo at gmail.com
Sat Apr 19 10:18:49 CDT 2008
Hi Satish,
1st of all, I forgot to inform u that I've changed the m and n to 800. I
would like to see if the larger value can make the scaling better. If
req, I can redo the test with m,n=600.
I can install MPICH but I don't think I can choose to run on a single
machine using from 1 to 8 procs. In order to run the code, I usually
have to use the command
bsub -o log -q linux64 ./a.out for single procs
bsub -o log -q mcore_parallel -n $ -a mvapich mpirun.lsf ./a.out where
$=no. of procs. for multiple procs
After that, when the job is running, I'll be given the server which my
job runs on e.g. atlas3-c10 (1 procs) or 2*atlas3-c10 + 2*atlas3-c12 (4
procs) or 2*atlas3-c10 + 2*atlas3-c12 +2*atlas3-c11 + 2*atlas3-c13 (8
procs). I was told that 2*atlas3-c10 doesn't mean that it is running on
a dual core single cpu.
Btw, are you saying that I should 1st install the latest MPICH2 build
with the option :
./configure --with-device=ch3:nemesis:newtcp -with-pm=gforker And then
install PETSc with the MPICH2?
So after that do you know how to do what you've suggest for my servers?
I don't really understand what you mean. May I supposed to run 4 jobs on
1 quadcore? Or 1 job using 4 cores on 1 quadcore? Well, I do know that
atlas3-c00 to c03 are the location of the quad cores. I can force to use
them by
bsub -o log -q mcore_parallel -n $ -m quadcore -a mvapich mpirun.lsf ./a.out
Lastly, I make a mistake in the different times reported by the same
compiler. Sorry abt that.
Thank you very much.
Satish Balay wrote:
> On Sat, 19 Apr 2008, Ben Tay wrote:
>
>
>> Btw, I'm not able to try the latest mpich2 because I do not have the
>> administrator rights. I was told that some special configuration is
>> required.
>>
>
> You don't need admin rights to install/use MPICH with the options I
> mentioned. I was sugesting just running in SMP mode on a single
> machine [from 1-8 procs on Quad-Core Intel Xeon X5355, to compare with
> my SMP runs] with:
>
> ./configure --with-device=ch3:nemesis:newtcp -with-pm=gforker
>
>
>> Btw, should there be any different in speed whether I use mpiuni and
>> ifort or mpi and mpif90? I tried on ex2f (below) and there's only a
>> small difference. If there is a large difference (mpi being slower),
>> then it mean there's something wrong in the code?
>>
>
> For one - you are not using MPIUNI. You are using
> --with-mpi-dir=/lsftmp/g0306332/mpich2. However - if compilers are the
> same & compiler options are the same, I would expect the same
> performance in both the cases. Do you get such different times for
> different runs of the same binary?
>
> MatMult 384 vs 423
>
> What if you run both of the binaries on the same machine? [as a single
> job?].
>
> If you are using pbs scheduler - sugest doing:
> - squb -I [to get interactive access to thenodes]
> - login to each node - to check no one else is using the scheduled nodes.
> - run multiple jobs during this single allocation for comparision.
>
> These are general tips to help you debug performance on your cluster.
>
> BTW: I get:
> ex2f-600-1p.log:MatMult 1192 1.0 9.7109e+00 1.0 3.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 11 0 0 0 14 11 0 0 0 397
>
> You get:
> log.1:MatMult 1879 1.0 2.8137e+01 1.0 3.84e+08 1.0 0.0e+00 0.0e+00 0.0e+00 12 11 0 0 0 12 11 0 0 0 384
>
>
> There is a difference in number of iterations. Are you sure you are
> using the same ex2f with -m 600 -n 600 options?
>
> Satish
More information about the petsc-users
mailing list