[petsc-users] Understanding matmult memory performance

Karl Rupp rupp at iue.tuwien.ac.at
Fri Sep 29 09:08:18 CDT 2017


Hi Lawrence,

according to
https://ark.intel.com/products/75283/Intel-Xeon-Processor-E5-2697-v2-30M-Cache-2_70-GHz
you get 59.7 GB/sec of peak memory bandwidth per CPU, so you should get 
about 240 GB/sec for your two-node system.

If you use PETSc's `make streams`, then processor placement may - 
unfortunately - not be ideal and hence underestimating the achievable 
performance. Have a look at the new PETSc 3.8 manual [1], Chapter 14, 
where Richard and I nailed down some of these performance aspects.

Best regards,
Karli

[1] http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf


On 09/29/2017 06:19 AM, Lawrence Mitchell wrote:
> Dear all,
> 
> I'm attempting to understand some results I'm getting for matmult performance.  In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible.
> 
> The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon E5-2697 v2).  The specced main memory bandwidth is 85.3 GB/s per node, and I measure a STREAM triad bandwidth using 48 MPI processes (two nodes) of 148.2 GB/s.  The last level cache is 30MB (shared between 12 cores)
> 
> The matrix I'm using is respectively a P1, P2, and P3 discretisation of the Laplacian on a regular tetrahedral grid.
> 
> The matrix sizes are respectively:
> 
> P1:
> Mat Object: 48 MPI processes
>    type: mpiaij
>    rows=8120601, cols=8120601
>    total: nonzeros=120841801, allocated nonzeros=120841801
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> 
> P2:
> Mat Object: 48 MPI processes
>    type: mpiaij
>    rows=8120601, cols=8120601
>    total: nonzeros=231382401, allocated nonzeros=231382401
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> 
> P3:
> Mat Object: 48 MPI processes
>    type: mpiaij
>    rows=13997521, cols=13997521
>    total: nonzeros=674173201, allocated nonzeros=674173201
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> 
> Both sizeof(PetscScalar) and sizeof(PetscInt) are 8 bytes.
> 
> Ignoring data for vector and row indices, then, for a matmult I need to move 16*nonzeros bytes.
> 
> MatMults take, respectively:
> 
> P1: 0.0114362s
> P2: 0.0196032s
> P3: 0.0524525s
> 
> So the estimated achieved memory bandwidth is:
> 
> P1: 120841801 * 16 / 0.0114362 = 157.45GB/s
> P2: 231382401 * 16 / 0.0196032 = 175.88GB/s
> P3: 674173201 * 16 / 0.0524525 = 191.52GB/s
> 
> So all of those numbers are higher than the stream bandwidth, and the P2 and P3 numbers are higher than the spec sheet bandwidth.
> 
> I don't think PETSc is doing anything magic, but hints appreciated, it would be nice to explain this.
> 
> Cheers,
> 
> Lawrence
> 
> Full -log_view output:
> 
> --------------------------------------------------------------------------------
> *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> 
> --------------------------------------------------------------------------------
> Int Type has 8 bytes, Scalar Type has 8 bytes
> 
> P1:
> Mat Object: 48 MPI processes
>    type: mpiaij
>    rows=8120601, cols=8120601
>    total: nonzeros=120841801, allocated nonzeros=120841801
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> P2:
> Mat Object: 48 MPI processes
>    type: mpiaij
>    rows=8120601, cols=8120601
>    total: nonzeros=231382401, allocated nonzeros=231382401
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> P3:
> Mat Object: 48 MPI processes
>    type: mpiaij
>    rows=13997521, cols=13997521
>    total: nonzeros=674173201, allocated nonzeros=674173201
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> 
> profile-matvec.py on a petsc-gnu51-ivybridge-int64 named nid00013 with 48 processors, by lmn01 Fri Sep 29 11:58:21 2017
> Using Petsc Development GIT revision: v3.7.5-3014-g413f72f  GIT Date: 2017-02-05 17:50:57 -0600
> 
>                           Max       Max/Min        Avg      Total
> Time (sec):           1.150e+02      1.00000   1.150e+02
> Objects:              1.832e+03      1.50534   1.269e+03
> Flops:                2.652e+10      1.16244   2.486e+10  1.193e+12
> Flops/sec:            2.306e+08      1.16244   2.162e+08  1.038e+10
> MPI Messages:         1.021e+04      3.00279   5.091e+03  2.444e+05
> MPI Message Lengths:  3.314e+09      1.97310   3.697e+05  9.035e+10
> MPI Reductions:       2.630e+02      1.00000
> 
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                              e.g., VecAXPY() for real vectors of length N --> 2N flops
>                              and VecAXPY() for complex vectors of length N --> 8N flops
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                          Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
>   0:      Main Stage: 1.0701e+02  93.1%  5.5715e+11  46.7%  1.942e+05  79.4%  3.644e+05       98.6%  2.560e+02  97.3%
>   1: P(1) aij matrix: 1.5561e+00   1.4%  5.5574e+10   4.7%  1.688e+04   6.9%  9.789e+02        0.3%  2.000e+00   0.8%
>   2: P(2) aij matrix: 1.9378e+00   1.7%  8.8214e+10   7.4%  1.688e+04   6.9%  1.483e+03        0.4%  2.000e+00   0.8%
>   3: P(3) aij matrix: 4.4890e+00   3.9%  4.9225e+11  41.3%  1.648e+04   6.7%  2.829e+03        0.8%  2.000e+00   0.8%
> 
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>     Count: number of times phase was executed
>     Time and Flops: Max - maximum over all processors
>                     Ratio - ratio of maximum to minimum over all processors
>     Mess: number of messages sent
>     Avg. len: average message length (bytes)
>     Reduct: number of global reductions
>     Global: entire computation
>     Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>        %T - percent time in this phase         %F - percent flops in this phase
>        %M - percent messages in this phase     %L - percent message lengths in this phase
>        %R - percent reductions in this phase
>     Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> PetscBarrier           4 1.0 2.7271e+00 1.0 0.00e+00 0.0 3.8e+03 2.4e+01 2.0e+01  2  0  2  0  8   3  0  2  0  8     0
> BuildTwoSided        124 1.0 9.0858e+00 7.2 0.00e+00 0.0 2.7e+04 8.0e+00 0.0e+00  6  0 11  0  0   7  0 14  0  0     0
> VecSet                16 1.0 5.8370e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin        3 1.0 2.1945e-0269.7 0.00e+00 0.0 1.3e+03 2.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
> VecScatterEnd          3 1.0 2.2460e-0218.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSetRandom           3 1.0 4.0847e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult                3 1.0 9.4907e-02 1.2 4.50e+07 1.1 1.3e+03 2.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0 21311
> MatAssemblyBegin      12 1.0 2.6438e-03235.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd        12 1.0 6.6632e-01 2.5 0.00e+00 0.0 2.5e+03 1.3e+04 2.4e+01  0  0  1  0  9   0  0  1  0  9     0
> MatView                9 1.0 5.3831e-0112.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  3   0  0  0  0  4     0
> Mesh Partition         6 1.0 1.3552e+01 1.0 0.00e+00 0.0 1.0e+05 5.9e+04 3.3e+01 12  0 41  7 13  13  0 52  7 13     0
> Mesh Migration         6 1.0 1.8341e+01 1.0 0.00e+00 0.0 7.5e+04 1.0e+06 7.2e+01 16  0 31 85 27  17  0 39 86 28     0
> DMPlexInterp           3 1.0 1.3771e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 12  0  0  0  2  13  0  0  0  2     0
> DMPlexDistribute       3 1.0 1.0266e+01 1.0 0.00e+00 0.0 4.9e+04 5.9e+04 2.7e+01  9  0 20  3 10  10  0 25  3 11     0
> DMPlexDistCones        6 1.0 6.9775e+00 1.5 0.00e+00 0.0 1.2e+04 2.3e+06 0.0e+00  5  0  5 32  0   6  0  6 32  0     0
> DMPlexDistLabels       6 1.0 7.9111e+00 1.0 0.00e+00 0.0 4.0e+04 9.8e+05 6.0e+00  7  0 16 43  2   7  0 21 44  2     0
> DMPlexDistribOL        3 1.0 2.2335e+01 1.0 0.00e+00 0.0 1.3e+05 6.6e+05 7.8e+01 19  0 53 94 30  21  0 66 95 30     0
> DMPlexDistField        9 1.0 7.2773e-01 1.0 0.00e+00 0.0 1.7e+04 2.0e+05 6.0e+00  1  0  7  4  2   1  0  9  4  2     0
> DMPlexDistData         6 1.0 8.0047e+00 9.4 0.00e+00 0.0 8.6e+04 1.2e+04 0.0e+00  6  0 35  1  0   6  0 45  1  0     0
> DMPlexStratify        19 1.0 1.8531e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 15  0  0  0  7  16  0  0  0  7     0
> SFSetGraph           141 1.0 2.2412e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> SFBcastBegin         271 1.0 1.1975e+01 2.0 0.00e+00 0.0 1.8e+05 4.8e+05 0.0e+00  9  0 75 98  0  10  0 95100  0     0
> SFBcastEnd           271 1.0 6.4306e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> SFReduceBegin         12 1.0 1.7538e-0112.8 0.00e+00 0.0 4.8e+03 5.9e+04 0.0e+00  0  0  2  0  0   0  0  2  0  0     0
> SFReduceEnd           12 1.0 2.2638e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFFetchOpBegin         3 1.0 9.9087e-0415.6 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFFetchOpEnd           3 1.0 3.6049e-02 6.4 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> CreateMesh            15 1.0 4.9047e+01 1.0 0.00e+00 0.0 1.8e+05 4.9e+05 1.2e+02 42  0 73 97 44  45  0 92 98 46     0
> CreateFunctionSpace       3 1.0 4.2819e+01 1.0 0.00e+00 0.0 1.4e+05 6.3e+05 1.2e+02 37  0 56 95 44  40  0 71 97 45     0
> Mesh: reorder          3 1.0 1.5455e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  1  0  0  0  2   1  0  0  0  2     0
> Mesh: numbering        3 1.0 1.0627e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  9  0  0  0  2  10  0  0  0  2     0
> CreateSparsity         3 1.0 2.0243e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> MatZeroInitial         3 1.0 2.7938e+00 1.0 0.00e+00 0.0 2.5e+03 1.3e+04 2.7e+01  2  0  1  0 10   3  0  1  0 11     0
> ParLoopExecute         6 1.0 3.1709e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00  3 47  0  0  0   3100  0  0  0 175069
> ParLoopset_4           2 1.0 1.1100e-0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopHaloEnd         6 1.0 2.9564e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednBegin       6 1.0 7.0810e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednEnd         6 1.0 6.5088e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopCells           9 1.0 2.9736e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00  2 47  0  0  0   3100  0  0  0 186686
> ParLoopset_10          2 1.0 1.1411e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopset_16          2 1.0 1.1880e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> 
> --- Event Stage 1: P(1) aij matrix
> 
> VecScatterBegin       40 1.0 1.1312e-02 8.5 0.00e+00 0.0 1.7e+04 1.4e+04 0.0e+00  0  0  7  0  0   0  0100100  0     0
> VecScatterEnd         40 1.0 2.6442e-0161.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  0  0  0  0     0
> VecSetRandom          40 1.0 4.4251e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  27  0  0  0  0     0
> MatMult               40 1.0 4.5745e-01 1.1 2.06e+08 1.1 1.7e+04 1.4e+04 0.0e+00  0  1  7  0  0  28 17100100  0 20423
> MatAssemblyBegin       3 1.0 2.3842e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         3 1.0 1.8371e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> MatZeroEntries         1 1.0 5.2531e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView                2 1.0 1.8248e-012468.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   5  0  0  0100     0
> AssembleMat            1 1.0 7.0037e-01 1.0 1.01e+09 1.1 0.0e+00 0.0e+00 2.0e+00  1  4  0  0  1  45 83  0  0100 66009
> ParLoopExecute         1 1.0 6.7369e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0  38 83  0  0  0 68623
> ParLoopHaloEnd         1 1.0 1.3113e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednBegin       1 1.0 1.3113e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednEnd         1 1.0 1.0967e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopCells           3 1.0 6.7352e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0  38 83  0  0  0 68641
> 
> --- Event Stage 2: P(2) aij matrix
> 
> VecScatterBegin       40 1.0 1.2448e-02 6.3 0.00e+00 0.0 1.7e+04 2.1e+04 0.0e+00  0  0  7  0  0   0  0100100  0     0
> VecScatterEnd         40 1.0 4.3488e-0156.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  14  0  0  0  0     0
> VecSetRandom          40 1.0 4.4287e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  22  0  0  0  0     0
> MatMult               40 1.0 7.8413e-01 1.1 4.04e+08 1.1 1.7e+04 2.1e+04 0.0e+00  1  2  7  0  0  39 21100100  0 23192
> MatAssemblyBegin       3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         3 1.0 2.4675e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> MatZeroEntries         1 1.0 9.4781e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView                2 1.0 1.4482e-01344.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   3  0  0  0100     0
> AssembleMat            1 1.0 7.5959e-01 1.0 1.57e+09 1.2 0.0e+00 0.0e+00 2.0e+00  1  6  0  0  1  39 79  0  0100 92192
> ParLoopExecute         1 1.0 7.1835e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0  34 79  0  0  0 97484
> ParLoopHaloEnd         1 1.0 1.1921e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednBegin       1 1.0 1.7881e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopCells           3 1.0 7.1820e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0  34 79  0  0  0 97505
> 
> --- Event Stage 3: P(3) aij matrix
> 
> VecScatterBegin       40 1.0 2.3520e-0210.9 0.00e+00 0.0 1.6e+04 4.2e+04 0.0e+00  0  0  7  1  0   0  0100100  0     0
> VecScatterEnd         40 1.0 6.6521e-0138.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  0  0  0  0     0
> VecSetRandom          40 1.0 7.5565e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  16  0  0  0  0     0
> MatMult               40 1.0 2.0981e+00 1.0 1.19e+09 1.1 1.6e+04 4.2e+04 0.0e+00  2  4  7  1  0  46 11100100  0 25439
> MatAssemblyBegin       3 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         3 1.0 5.6094e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> MatZeroEntries         1 1.0 2.9610e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> MatView                2 1.0 2.8071e-01958.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   3  0  0  0100     0
> AssembleMat            1 1.0 1.7038e+00 1.0 9.94e+09 1.2 0.0e+00 0.0e+00 2.0e+00  1 37  0  0  1  38 89  0  0100 257591
> ParLoopExecute         1 1.0 1.6101e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 37  0  0  0  32 89  0  0  0 272582
> ParLoopHaloEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednBegin       1 1.0 1.7166e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopRednEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ParLoopCells           3 1.0 1.6099e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 37  0  0  0  32 89  0  0  0 272617
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>             Container    12             11         6776     0.
>                Viewer     4              0            0     0.
>           PetscRandom     3              3         2058     0.
>             Index Set  1095           1085      1383616     0.
>     IS L to G Mapping    15             14    204830392     0.
>               Section   222            209       158840     0.
>                Vector    31             28     41441632     0.
>        Vector Scatter     3              2         2416     0.
>                Matrix    22             18    131705576     0.
>      Distributed Mesh    40             37       182200     0.
>      GraphPartitioner    19             18        11808     0.
> Star Forest Bipartite Graph   206            200       178256     0.
>       Discrete System    40             37        34336     0.
> 
> --- Event Stage 1: P(1) aij matrix
> 
>           PetscRandom    40             40        27440     0.
> 
> --- Event Stage 2: P(2) aij matrix
> 
>           PetscRandom    40             40        27440     0.
> 
> --- Event Stage 3: P(3) aij matrix
> 
>           PetscRandom    40             40        27440     0.
> ========================================================================================================================
> Average time to get PetscTime(): 0.
> Average time for MPI_Barrier(): 1.06335e-05
> Average time for zero size MPI_Send(): 1.41561e-06
> #PETSc Option Table entries:
> --dimension 3
> --output-file poisson-matvecs.csv
> --problem poisson
> -log_view
> -mat_view ::ascii_info
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> Configure options: --COPTFLAGS="-march=ivybridge -O3" --CXXOPTFLAGS="-march=ivybridge -O3" --FOPTFLAGS="-march=ivybridge -O3" --PETSC_ARCH=petsc-gnu51-ivybridge-int64 --download-exodusii --download-hypre --download-metis --download-netcdf --download-parmetis --download-sowing=1 --known-bits-per-byte=8 --known-has-attribute-aligned=1 --known-level1-dcache-assoc=8 --known-level1-dcache-linesize=64 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=1 --known-sdot-returns-double=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --known-snrm2-returns-double=0 --prefix=/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64 --with-64-bit-indices=1 --with-batch=1 --with-blas-lapack-lib="-L/opt/cray/libsci/16.03.1/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.14/GNU/5.1 --with-hdf5=1 --with-make-np=4 --with-pic=1 --with-shared-libraries=1 --with-x=0 --download-eigen
> -----------------------------------------
> Libraries compiled on Tue Feb 14 12:07:09 2017 on eslogin003
> Machine characteristics: Linux-3.0.101-0.47.86.1.11753.0.PTF-default-x86_64-with-SuSE-11-x86_64
> Using PETSc directory: /home2/n01/n01/lmn01/src/petsc
> Using PETSc arch: petsc-gnu51-ivybridge-int64
> -----------------------------------------
> 
> Using C compiler: cc  -fPIC  -march=ivybridge -O3  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: ftn  -fPIC -march=ivybridge -O3   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
> 
> Using include paths: -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include/eigen3 -I/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/include
> -----------------------------------------
> 
> Using C linker: cc
> Using Fortran linker: ftn
> Using libraries: -Wl,-rpath,/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -L/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -lpetsc -Wl,-rpath,/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -L/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -lHYPRE -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -L/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl -lcrypto -ldl
> -----------------------------------------
> 
> Application 28632506 resources: utime ~4100s, stime ~428s, Rss ~2685552, inblocks ~2935062, outblocks ~42464
> --------------------------------------------------------------------------------
> 
> Resources requested: ncpus=48,place=free,walltime=00:20:00
> Resources allocated: cpupercent=0,cput=00:00:02,mem=8980kb,ncpus=48,vmem=172968kb,walltime=00:02:20
> 
> *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> --------------------------------------------------------------------------------
> 


More information about the petsc-users mailing list