[petsc-users] Question about ksp ex3.c

Jed Brown jedbrown at mcs.anl.gov
Thu Sep 29 08:09:52 CDT 2011


On Thu, Sep 29, 2011 at 07:44, Matthew Knepley <knepley at gmail.com> wrote:

> The way I read these numbers is that there is bandwidth for about 3 cores
> on
> this machine, and non-negligible synchronization penalty:
>
>                         1 proc   2 proc   4 proc    8 proc
> VecAXPBYCZ    496        857      1064      1070
> VecDot               451        724      1089        736
> MatMult              434        638        701        703
>

Matt, thanks for pulling out this summary. The synchronization for the dot
product is clearly expensive here. I'm surprised it's so significant on such
a small problem, but it is a common problem for scalability.

I think that MPI is putting one process per socket when you go from 1 to 2
processes. This gives you pretty good speedup despite the fact that memory
traffic for both sockets is routed through the "Blackford" chipset. Intel
fixed this in later generations by throwing out the notion of uniform memory
access by having independent memory banks for each socket (which AMD had at
the time these chips came out).

If not odd hardware issues such as having enough memory streams and
imbalances, one process per socket can saturate the memory bandwidth, so the
speed-up you get from 2 procs (1 per socket) to 4 procs (2 per socket) is
minimal. On this architecture, you typically see the STREAM bandwidth go
down slightly as you add more than 2 procs per socket, so it's no surprise
that it doesn't help here.

Note that your naive 1D partition is getting worse as you add processes. The
MatMult should scale out somewhat better if you use a 2D decomposition, as
is done by any of the examples using a DA (DMDA in 3.2) for grid management.


>
> The bandwidth tops out between 2 and 4 cores (The 5345 should have 10.6
> GB/s
> but you should runs streams as Barry says to see what is achievable). There
> is
> obviously a penalty for VecDot against VecAXPYCZ, which is the sync penalty
> which also seems to affect MatMult. Maybe Jed can explain that.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110929/ebc3910b/attachment-0001.htm>


More information about the petsc-users mailing list