[petsc-users] Configuring PETSc for KNL

Jed Brown jed at jedbrown.org
Wed Apr 5 10:53:52 CDT 2017


"Zhang, Hong" <hongzhang at anl.gov> writes:

> On Apr 4, 2017, at 10:45 PM, Justin Chang <jychang48 at gmail.com<mailto:jychang48 at gmail.com>> wrote:
>
> So I tried the following options:
>
> -M 40
> -N 40
> -P 5
> -da_refine 1/2/3/4
> -log_view
> -mg_coarse_pc_type gamg
> -mg_levels_0_pc_type gamg
> -mg_levels_1_sub_pc_type cholesky
> -pc_type mg
> -thi_mat_type baij
>
> Performance improved dramatically. However, Haswell still beats out KNL but only by a little. Now it seems like MatSOR is taking some time (though I can't really judge whether it's significant or not). Attached are the log files.
>
>
> MatSOR takes only 3% of the total time. Most of the time is spent on PCSetUp (~30%) and PCApply (~11%).

I don't see any of your conclusions in the actual data, unless you only
looked at the smallest size that Justin tested.  For example, from the
largest problem size in Justin's logs:

KNL:
MatSOR              2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 36 45  0  0  0  36 45  0  0  0 11946
KSPSolve               8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06 8.8e+03 5.0e+03 68 99 98 61 98  68 99 98 61 98 14409
SNESSolve              1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06 1.4e+04 5.1e+03 96100100100 99  96100100100 99 10338
SNESFunctionEval       9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03 3.2e+04 0.0e+00  6  0  1  1  0   6  0  1  1  0     0
SNESJacobianEval      40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04 2.5e+05 1.4e+02 24  0  3 49  3  24  0  3 49  3     0
PCSetUp               16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05 1.0e+04 3.8e+03  5  0 18 13 74   5  0 18 13 74   119
PCSetUpOnBlocks       60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply               60 1.0 3.8705e+02 1.0 9.32e+10 1.0 1.2e+06 8.0e+03 1.1e+03 60 94 79 45 21  60 94 79 45 21 15407
MatMult             2860 1.0 1.4578e+02 1.1 4.92e+10 1.0 1.2e+06 8.8e+03 0.0e+00 21 49 77 48  0  21 49 77 48  0 21579

Haswell:
MatSOR              2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 0.0e+00 0.0e+00 48 45  0  0  0  48 45  0  0  0 10936
KSPSolve               7 1.0 3.5937e+02 1.0 1.67e+11 1.0 6.7e+05 1.3e+04 4.5e+03 81 99 98 60 98  81 99 98 60 98 14828
SNESSolve              1 1.0 4.3749e+02 1.0 1.68e+11 1.0 6.8e+05 2.1e+04 4.5e+03 99100100100 99  99100100100 99 12280
SNESFunctionEval       8 1.0 1.5460e+01 1.0 0.00e+00 0.0 4.1e+03 4.7e+04 0.0e+00  3  0  1  1  0   3  0  1  1  0     0
SNESJacobianEval      35 1.0 6.8994e+01 1.0 0.00e+00 0.0 1.9e+04 3.8e+05 1.3e+02 16  0  3 50  3  16  0  3 50  3     0
PCSetUp               14 1.0 1.0860e+01 1.0 1.15e+08 1.0 1.3e+05 1.4e+04 3.4e+03  2  0 19 13 74   2  0 19 13 74   335
PCSetUpOnBlocks       50 1.0 4.5601e-02 1.6 2.89e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     6
PCApply               50 1.0 3.3545e+02 1.0 1.57e+11 1.0 5.3e+05 1.2e+04 9.7e+02 75 94 77 44 21  75 94 77 44 21 15017
MatMult             2410 1.0 1.2050e+02 1.1 8.28e+10 1.0 5.1e+05 1.3e+04 0.0e+00 27 49 75 46  0  27 49 75 46  0 21983

> If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be better?
>
> The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is about two times as fast as on KNL, but it eats only 3%-4% of the total time.

SNESJacobianEval alone accounts for 90 seconds of the 180 second
difference between KNL and Haswell.

> According to your logs, the compute-intensive kernels such as MatMult,
> MatSOR, PCApply run faster (~2X) on Haswell. 

They run almost the same speed.

> But since the setup time dominates in this test, 

It doesn't dominate on the larger sizes.

> Haswell would not show much benefit. If you increase the problem size,
> it could be expected that the performance gap would also increase.

Backwards.  Haswell is great for low latency on small problem sizes
while KNL offers higher theoretical throughput (often not realized due
to lack of vectorization) for sufficiently large problem sizes
(especially if they don't fit in Haswell L3 cache but do fit in MCDRAM).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170405/9e1f060a/attachment.pgp>


More information about the petsc-users mailing list