[petsc-users] Understanding matmult memory performance

Lawrence Mitchell lawrence.mitchell at imperial.ac.uk
Fri Sep 29 06:19:54 CDT 2017


Dear all,

I'm attempting to understand some results I'm getting for matmult performance.  In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible.

The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon E5-2697 v2).  The specced main memory bandwidth is 85.3 GB/s per node, and I measure a STREAM triad bandwidth using 48 MPI processes (two nodes) of 148.2 GB/s.  The last level cache is 30MB (shared between 12 cores)

The matrix I'm using is respectively a P1, P2, and P3 discretisation of the Laplacian on a regular tetrahedral grid.

The matrix sizes are respectively:

P1:
Mat Object: 48 MPI processes
  type: mpiaij
  rows=8120601, cols=8120601
  total: nonzeros=120841801, allocated nonzeros=120841801
  total number of mallocs used during MatSetValues calls =0
    not using I-node (on process 0) routines


P2:
Mat Object: 48 MPI processes
  type: mpiaij
  rows=8120601, cols=8120601
  total: nonzeros=231382401, allocated nonzeros=231382401
  total number of mallocs used during MatSetValues calls =0
    not using I-node (on process 0) routines


P3:
Mat Object: 48 MPI processes
  type: mpiaij
  rows=13997521, cols=13997521
  total: nonzeros=674173201, allocated nonzeros=674173201
  total number of mallocs used during MatSetValues calls =0
    not using I-node (on process 0) routines


Both sizeof(PetscScalar) and sizeof(PetscInt) are 8 bytes.

Ignoring data for vector and row indices, then, for a matmult I need to move 16*nonzeros bytes.

MatMults take, respectively:

P1: 0.0114362s
P2: 0.0196032s
P3: 0.0524525s

So the estimated achieved memory bandwidth is:

P1: 120841801 * 16 / 0.0114362 = 157.45GB/s
P2: 231382401 * 16 / 0.0196032 = 175.88GB/s
P3: 674173201 * 16 / 0.0524525 = 191.52GB/s

So all of those numbers are higher than the stream bandwidth, and the P2 and P3 numbers are higher than the spec sheet bandwidth.

I don't think PETSc is doing anything magic, but hints appreciated, it would be nice to explain this.

Cheers,

Lawrence

Full -log_view output:

--------------------------------------------------------------------------------
*** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
*** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
*** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
*** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***

--------------------------------------------------------------------------------
Int Type has 8 bytes, Scalar Type has 8 bytes

P1:
Mat Object: 48 MPI processes
  type: mpiaij
  rows=8120601, cols=8120601
  total: nonzeros=120841801, allocated nonzeros=120841801
  total number of mallocs used during MatSetValues calls =0
    not using I-node (on process 0) routines

P2:
Mat Object: 48 MPI processes
  type: mpiaij
  rows=8120601, cols=8120601
  total: nonzeros=231382401, allocated nonzeros=231382401
  total number of mallocs used during MatSetValues calls =0
    not using I-node (on process 0) routines

P3:
Mat Object: 48 MPI processes
  type: mpiaij
  rows=13997521, cols=13997521
  total: nonzeros=674173201, allocated nonzeros=674173201
  total number of mallocs used during MatSetValues calls =0
    not using I-node (on process 0) routines

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

profile-matvec.py on a petsc-gnu51-ivybridge-int64 named nid00013 with 48 processors, by lmn01 Fri Sep 29 11:58:21 2017
Using Petsc Development GIT revision: v3.7.5-3014-g413f72f  GIT Date: 2017-02-05 17:50:57 -0600

                         Max       Max/Min        Avg      Total 
Time (sec):           1.150e+02      1.00000   1.150e+02
Objects:              1.832e+03      1.50534   1.269e+03
Flops:                2.652e+10      1.16244   2.486e+10  1.193e+12
Flops/sec:            2.306e+08      1.16244   2.162e+08  1.038e+10
MPI Messages:         1.021e+04      3.00279   5.091e+03  2.444e+05
MPI Message Lengths:  3.314e+09      1.97310   3.697e+05  9.035e+10
MPI Reductions:       2.630e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.0701e+02  93.1%  5.5715e+11  46.7%  1.942e+05  79.4%  3.644e+05       98.6%  2.560e+02  97.3% 
 1: P(1) aij matrix: 1.5561e+00   1.4%  5.5574e+10   4.7%  1.688e+04   6.9%  9.789e+02        0.3%  2.000e+00   0.8% 
 2: P(2) aij matrix: 1.9378e+00   1.7%  8.8214e+10   7.4%  1.688e+04   6.9%  1.483e+03        0.4%  2.000e+00   0.8% 
 3: P(3) aij matrix: 4.4890e+00   3.9%  4.9225e+11  41.3%  1.648e+04   6.7%  2.829e+03        0.8%  2.000e+00   0.8% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           4 1.0 2.7271e+00 1.0 0.00e+00 0.0 3.8e+03 2.4e+01 2.0e+01  2  0  2  0  8   3  0  2  0  8     0
BuildTwoSided        124 1.0 9.0858e+00 7.2 0.00e+00 0.0 2.7e+04 8.0e+00 0.0e+00  6  0 11  0  0   7  0 14  0  0     0
VecSet                16 1.0 5.8370e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        3 1.0 2.1945e-0269.7 0.00e+00 0.0 1.3e+03 2.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
VecScatterEnd          3 1.0 2.2460e-0218.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           3 1.0 4.0847e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult                3 1.0 9.4907e-02 1.2 4.50e+07 1.1 1.3e+03 2.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0 21311
MatAssemblyBegin      12 1.0 2.6438e-03235.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        12 1.0 6.6632e-01 2.5 0.00e+00 0.0 2.5e+03 1.3e+04 2.4e+01  0  0  1  0  9   0  0  1  0  9     0
MatView                9 1.0 5.3831e-0112.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  3   0  0  0  0  4     0
Mesh Partition         6 1.0 1.3552e+01 1.0 0.00e+00 0.0 1.0e+05 5.9e+04 3.3e+01 12  0 41  7 13  13  0 52  7 13     0
Mesh Migration         6 1.0 1.8341e+01 1.0 0.00e+00 0.0 7.5e+04 1.0e+06 7.2e+01 16  0 31 85 27  17  0 39 86 28     0
DMPlexInterp           3 1.0 1.3771e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 12  0  0  0  2  13  0  0  0  2     0
DMPlexDistribute       3 1.0 1.0266e+01 1.0 0.00e+00 0.0 4.9e+04 5.9e+04 2.7e+01  9  0 20  3 10  10  0 25  3 11     0
DMPlexDistCones        6 1.0 6.9775e+00 1.5 0.00e+00 0.0 1.2e+04 2.3e+06 0.0e+00  5  0  5 32  0   6  0  6 32  0     0
DMPlexDistLabels       6 1.0 7.9111e+00 1.0 0.00e+00 0.0 4.0e+04 9.8e+05 6.0e+00  7  0 16 43  2   7  0 21 44  2     0
DMPlexDistribOL        3 1.0 2.2335e+01 1.0 0.00e+00 0.0 1.3e+05 6.6e+05 7.8e+01 19  0 53 94 30  21  0 66 95 30     0
DMPlexDistField        9 1.0 7.2773e-01 1.0 0.00e+00 0.0 1.7e+04 2.0e+05 6.0e+00  1  0  7  4  2   1  0  9  4  2     0
DMPlexDistData         6 1.0 8.0047e+00 9.4 0.00e+00 0.0 8.6e+04 1.2e+04 0.0e+00  6  0 35  1  0   6  0 45  1  0     0
DMPlexStratify        19 1.0 1.8531e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 15  0  0  0  7  16  0  0  0  7     0
SFSetGraph           141 1.0 2.2412e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
SFBcastBegin         271 1.0 1.1975e+01 2.0 0.00e+00 0.0 1.8e+05 4.8e+05 0.0e+00  9  0 75 98  0  10  0 95100  0     0
SFBcastEnd           271 1.0 6.4306e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
SFReduceBegin         12 1.0 1.7538e-0112.8 0.00e+00 0.0 4.8e+03 5.9e+04 0.0e+00  0  0  2  0  0   0  0  2  0  0     0
SFReduceEnd           12 1.0 2.2638e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFFetchOpBegin         3 1.0 9.9087e-0415.6 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFFetchOpEnd           3 1.0 3.6049e-02 6.4 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
CreateMesh            15 1.0 4.9047e+01 1.0 0.00e+00 0.0 1.8e+05 4.9e+05 1.2e+02 42  0 73 97 44  45  0 92 98 46     0
CreateFunctionSpace       3 1.0 4.2819e+01 1.0 0.00e+00 0.0 1.4e+05 6.3e+05 1.2e+02 37  0 56 95 44  40  0 71 97 45     0
Mesh: reorder          3 1.0 1.5455e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  1  0  0  0  2   1  0  0  0  2     0
Mesh: numbering        3 1.0 1.0627e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  9  0  0  0  2  10  0  0  0  2     0
CreateSparsity         3 1.0 2.0243e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatZeroInitial         3 1.0 2.7938e+00 1.0 0.00e+00 0.0 2.5e+03 1.3e+04 2.7e+01  2  0  1  0 10   3  0  1  0 11     0
ParLoopExecute         6 1.0 3.1709e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00  3 47  0  0  0   3100  0  0  0 175069
ParLoopset_4           2 1.0 1.1100e-0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopHaloEnd         6 1.0 2.9564e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednBegin       6 1.0 7.0810e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednEnd         6 1.0 6.5088e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopCells           9 1.0 2.9736e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00  2 47  0  0  0   3100  0  0  0 186686
ParLoopset_10          2 1.0 1.1411e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopset_16          2 1.0 1.1880e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: P(1) aij matrix

VecScatterBegin       40 1.0 1.1312e-02 8.5 0.00e+00 0.0 1.7e+04 1.4e+04 0.0e+00  0  0  7  0  0   0  0100100  0     0
VecScatterEnd         40 1.0 2.6442e-0161.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  0  0  0  0     0
VecSetRandom          40 1.0 4.4251e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  27  0  0  0  0     0
MatMult               40 1.0 4.5745e-01 1.1 2.06e+08 1.1 1.7e+04 1.4e+04 0.0e+00  0  1  7  0  0  28 17100100  0 20423
MatAssemblyBegin       3 1.0 2.3842e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 1.8371e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatZeroEntries         1 1.0 5.2531e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                2 1.0 1.8248e-012468.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   5  0  0  0100     0
AssembleMat            1 1.0 7.0037e-01 1.0 1.01e+09 1.1 0.0e+00 0.0e+00 2.0e+00  1  4  0  0  1  45 83  0  0100 66009
ParLoopExecute         1 1.0 6.7369e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0  38 83  0  0  0 68623
ParLoopHaloEnd         1 1.0 1.3113e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednBegin       1 1.0 1.3113e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednEnd         1 1.0 1.0967e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopCells           3 1.0 6.7352e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0  38 83  0  0  0 68641

--- Event Stage 2: P(2) aij matrix

VecScatterBegin       40 1.0 1.2448e-02 6.3 0.00e+00 0.0 1.7e+04 2.1e+04 0.0e+00  0  0  7  0  0   0  0100100  0     0
VecScatterEnd         40 1.0 4.3488e-0156.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  14  0  0  0  0     0
VecSetRandom          40 1.0 4.4287e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  22  0  0  0  0     0
MatMult               40 1.0 7.8413e-01 1.1 4.04e+08 1.1 1.7e+04 2.1e+04 0.0e+00  1  2  7  0  0  39 21100100  0 23192
MatAssemblyBegin       3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 2.4675e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatZeroEntries         1 1.0 9.4781e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                2 1.0 1.4482e-01344.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   3  0  0  0100     0
AssembleMat            1 1.0 7.5959e-01 1.0 1.57e+09 1.2 0.0e+00 0.0e+00 2.0e+00  1  6  0  0  1  39 79  0  0100 92192
ParLoopExecute         1 1.0 7.1835e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0  34 79  0  0  0 97484
ParLoopHaloEnd         1 1.0 1.1921e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednBegin       1 1.0 1.7881e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopCells           3 1.0 7.1820e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0  34 79  0  0  0 97505

--- Event Stage 3: P(3) aij matrix

VecScatterBegin       40 1.0 2.3520e-0210.9 0.00e+00 0.0 1.6e+04 4.2e+04 0.0e+00  0  0  7  1  0   0  0100100  0     0
VecScatterEnd         40 1.0 6.6521e-0138.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  0  0  0  0     0
VecSetRandom          40 1.0 7.5565e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  16  0  0  0  0     0
MatMult               40 1.0 2.0981e+00 1.0 1.19e+09 1.1 1.6e+04 4.2e+04 0.0e+00  2  4  7  1  0  46 11100100  0 25439
MatAssemblyBegin       3 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 5.6094e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatZeroEntries         1 1.0 2.9610e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatView                2 1.0 2.8071e-01958.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   3  0  0  0100     0
AssembleMat            1 1.0 1.7038e+00 1.0 9.94e+09 1.2 0.0e+00 0.0e+00 2.0e+00  1 37  0  0  1  38 89  0  0100 257591
ParLoopExecute         1 1.0 1.6101e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 37  0  0  0  32 89  0  0  0 272582
ParLoopHaloEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednBegin       1 1.0 1.7166e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopRednEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopCells           3 1.0 1.6099e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 37  0  0  0  32 89  0  0  0 272617
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    12             11         6776     0.
              Viewer     4              0            0     0.
         PetscRandom     3              3         2058     0.
           Index Set  1095           1085      1383616     0.
   IS L to G Mapping    15             14    204830392     0.
             Section   222            209       158840     0.
              Vector    31             28     41441632     0.
      Vector Scatter     3              2         2416     0.
              Matrix    22             18    131705576     0.
    Distributed Mesh    40             37       182200     0.
    GraphPartitioner    19             18        11808     0.
Star Forest Bipartite Graph   206            200       178256     0.
     Discrete System    40             37        34336     0.

--- Event Stage 1: P(1) aij matrix

         PetscRandom    40             40        27440     0.

--- Event Stage 2: P(2) aij matrix

         PetscRandom    40             40        27440     0.

--- Event Stage 3: P(3) aij matrix

         PetscRandom    40             40        27440     0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 1.06335e-05
Average time for zero size MPI_Send(): 1.41561e-06
#PETSc Option Table entries:
--dimension 3
--output-file poisson-matvecs.csv
--problem poisson
-log_view
-mat_view ::ascii_info
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --COPTFLAGS="-march=ivybridge -O3" --CXXOPTFLAGS="-march=ivybridge -O3" --FOPTFLAGS="-march=ivybridge -O3" --PETSC_ARCH=petsc-gnu51-ivybridge-int64 --download-exodusii --download-hypre --download-metis --download-netcdf --download-parmetis --download-sowing=1 --known-bits-per-byte=8 --known-has-attribute-aligned=1 --known-level1-dcache-assoc=8 --known-level1-dcache-linesize=64 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=1 --known-sdot-returns-double=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --known-snrm2-returns-double=0 --prefix=/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64 --with-64-bit-indices=1 --with-batch=1 --with-blas-lapack-lib="-L/opt/cray/libsci/16.03.1/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.14/GNU/5.1 --with-hdf5=1 --with-make-np=4 --with-pic=1 --with-shared-libraries=1 --with-x=0 --download-eigen
-----------------------------------------
Libraries compiled on Tue Feb 14 12:07:09 2017 on eslogin003 
Machine characteristics: Linux-3.0.101-0.47.86.1.11753.0.PTF-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /home2/n01/n01/lmn01/src/petsc
Using PETSc arch: petsc-gnu51-ivybridge-int64
-----------------------------------------

Using C compiler: cc  -fPIC  -march=ivybridge -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -fPIC -march=ivybridge -O3   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include/eigen3 -I/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -L/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -lpetsc -Wl,-rpath,/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -L/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -lHYPRE -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -L/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl -lcrypto -ldl 
-----------------------------------------

Application 28632506 resources: utime ~4100s, stime ~428s, Rss ~2685552, inblocks ~2935062, outblocks ~42464
--------------------------------------------------------------------------------

Resources requested: ncpus=48,place=free,walltime=00:20:00
Resources allocated: cpupercent=0,cput=00:00:02,mem=8980kb,ncpus=48,vmem=172968kb,walltime=00:02:20

*** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
*** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
*** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
*** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
--------------------------------------------------------------------------------



More information about the petsc-users mailing list