[petsc-dev] MPIX_Iallreduce()
Jed Brown
jedbrown at mcs.anl.gov
Tue Mar 20 23:21:54 CDT 2012
On Tue, Mar 20, 2012 at 08:55, Satish Balay <balay at mcs.anl.gov> wrote:
> are you pinning the mpi jobs to specific cores for this tests? Does it
> make a difference?
>
It didn't seem to make a difference with 32 procs, but note that this
example fits everything in cache. We should have a less contrived use case
before putting more effort into it.
With 64 procs, binding seems more important.
jedbrown at cg:~/petsc/src/vec/vec/examples/tests$ ~/usr/mpich/bin/mpiexec -n
64 -binding rr ./ex42 -log_summary -splitreduction_async 0 | grep '^Vec'
Vector Object: 64 MPI processes
VecView 1 1.0 2.3429e-0387.0 0.00e+00 0.0 6.3e+01 4.0e+01
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1 1.0 2.1935e-05 7.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 1 1.0 2.7320e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00 5 0 0 0 3 5 0 0 0 3 0
VecAssemblyEnd 1 1.0 3.0994e-0510.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 300 1.0 7.4935e-04 1.4 0.00e+00 0.0 1.9e+04 4.0e+01
0.0e+00 1 0 98 99 0 1 0 98 99 0 0
VecScatterEnd 300 1.0 1.2340e-02 7.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 13 0 0 0 0 13 0 0 0 0 0
VecReduceArith 100 1.0 1.7476e-04 2.2 9.00e+02 1.0 0.0e+00 0.0e+00
0.0e+00 0100 0 0 0 0100 0 0 0 330
VecReduceComm 100 1.0 2.9184e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+02 57 0 0 0 89 58 0 0 0 90 0
jedbrown at cg:~/petsc/src/vec/vec/examples/tests$ ~/usr/mpich/bin/mpiexec -n
64 -binding rr ./ex42 -log_summary -splitreduction_async 1 | grep '^Vec'
Vector Object: 64 MPI processes
VecView 1 1.0 1.7929e-0394.0 0.00e+00 0.0 6.3e+01 4.0e+01
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecSet 1 1.0 2.6941e-05 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 1 1.0 2.5909e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00 10 0 0 0 25 10 0 0 0 27 0
VecAssemblyEnd 1 1.0 2.4080e-05 4.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 300 1.0 1.1673e-03 3.4 0.00e+00 0.0 1.9e+04 4.0e+01
0.0e+00 4 0 98 99 0 4 0 98 99 0 0
VecScatterEnd 300 1.0 2.0361e-03 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 7 0 0 0 0 7 0 0 0 0 0
VecReduceArith 100 1.0 2.0885e-04 2.9 9.00e+02 1.0 0.0e+00 0.0e+00
0.0e+00 1100 0 0 0 1100 0 0 0 276
VecReduceBegin 100 1.0 6.0058e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 3 0 0 0 0 3 0 0 0 0 0
VecReduceEnd 100 1.0 2.7788e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 13 0 0 0 0 13 0 0 0 0 0
>
> I'm curious as this machine has assymetric cores wrt L2/FPU.
> Presumably - using p0,p2,p4 etc should spread out the load -
> but I don't know if the kernel is doing this automatically.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120320/86ceaa6c/attachment.html>
More information about the petsc-dev
mailing list