[petsc-users] Understanding matmult memory performance
Lawrence Mitchell
lawrence.mitchell at imperial.ac.uk
Fri Sep 29 06:19:54 CDT 2017
Dear all,
I'm attempting to understand some results I'm getting for matmult performance. In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible.
The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon E5-2697 v2). The specced main memory bandwidth is 85.3 GB/s per node, and I measure a STREAM triad bandwidth using 48 MPI processes (two nodes) of 148.2 GB/s. The last level cache is 30MB (shared between 12 cores)
The matrix I'm using is respectively a P1, P2, and P3 discretisation of the Laplacian on a regular tetrahedral grid.
The matrix sizes are respectively:
P1:
Mat Object: 48 MPI processes
type: mpiaij
rows=8120601, cols=8120601
total: nonzeros=120841801, allocated nonzeros=120841801
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
P2:
Mat Object: 48 MPI processes
type: mpiaij
rows=8120601, cols=8120601
total: nonzeros=231382401, allocated nonzeros=231382401
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
P3:
Mat Object: 48 MPI processes
type: mpiaij
rows=13997521, cols=13997521
total: nonzeros=674173201, allocated nonzeros=674173201
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Both sizeof(PetscScalar) and sizeof(PetscInt) are 8 bytes.
Ignoring data for vector and row indices, then, for a matmult I need to move 16*nonzeros bytes.
MatMults take, respectively:
P1: 0.0114362s
P2: 0.0196032s
P3: 0.0524525s
So the estimated achieved memory bandwidth is:
P1: 120841801 * 16 / 0.0114362 = 157.45GB/s
P2: 231382401 * 16 / 0.0196032 = 175.88GB/s
P3: 674173201 * 16 / 0.0524525 = 191.52GB/s
So all of those numbers are higher than the stream bandwidth, and the P2 and P3 numbers are higher than the spec sheet bandwidth.
I don't think PETSc is doing anything magic, but hints appreciated, it would be nice to explain this.
Cheers,
Lawrence
Full -log_view output:
--------------------------------------------------------------------------------
*** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
*** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
*** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
*** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
--------------------------------------------------------------------------------
Int Type has 8 bytes, Scalar Type has 8 bytes
P1:
Mat Object: 48 MPI processes
type: mpiaij
rows=8120601, cols=8120601
total: nonzeros=120841801, allocated nonzeros=120841801
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
P2:
Mat Object: 48 MPI processes
type: mpiaij
rows=8120601, cols=8120601
total: nonzeros=231382401, allocated nonzeros=231382401
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
P3:
Mat Object: 48 MPI processes
type: mpiaij
rows=13997521, cols=13997521
total: nonzeros=674173201, allocated nonzeros=674173201
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
profile-matvec.py on a petsc-gnu51-ivybridge-int64 named nid00013 with 48 processors, by lmn01 Fri Sep 29 11:58:21 2017
Using Petsc Development GIT revision: v3.7.5-3014-g413f72f GIT Date: 2017-02-05 17:50:57 -0600
Max Max/Min Avg Total
Time (sec): 1.150e+02 1.00000 1.150e+02
Objects: 1.832e+03 1.50534 1.269e+03
Flops: 2.652e+10 1.16244 2.486e+10 1.193e+12
Flops/sec: 2.306e+08 1.16244 2.162e+08 1.038e+10
MPI Messages: 1.021e+04 3.00279 5.091e+03 2.444e+05
MPI Message Lengths: 3.314e+09 1.97310 3.697e+05 9.035e+10
MPI Reductions: 2.630e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.0701e+02 93.1% 5.5715e+11 46.7% 1.942e+05 79.4% 3.644e+05 98.6% 2.560e+02 97.3%
1: P(1) aij matrix: 1.5561e+00 1.4% 5.5574e+10 4.7% 1.688e+04 6.9% 9.789e+02 0.3% 2.000e+00 0.8%
2: P(2) aij matrix: 1.9378e+00 1.7% 8.8214e+10 7.4% 1.688e+04 6.9% 1.483e+03 0.4% 2.000e+00 0.8%
3: P(3) aij matrix: 4.4890e+00 3.9% 4.9225e+11 41.3% 1.648e+04 6.7% 2.829e+03 0.8% 2.000e+00 0.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 4 1.0 2.7271e+00 1.0 0.00e+00 0.0 3.8e+03 2.4e+01 2.0e+01 2 0 2 0 8 3 0 2 0 8 0
BuildTwoSided 124 1.0 9.0858e+00 7.2 0.00e+00 0.0 2.7e+04 8.0e+00 0.0e+00 6 0 11 0 0 7 0 14 0 0 0
VecSet 16 1.0 5.8370e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3 1.0 2.1945e-0269.7 0.00e+00 0.0 1.3e+03 2.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 0
VecScatterEnd 3 1.0 2.2460e-0218.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 3 1.0 4.0847e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 3 1.0 9.4907e-02 1.2 4.50e+07 1.1 1.3e+03 2.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 21311
MatAssemblyBegin 12 1.0 2.6438e-03235.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 12 1.0 6.6632e-01 2.5 0.00e+00 0.0 2.5e+03 1.3e+04 2.4e+01 0 0 1 0 9 0 0 1 0 9 0
MatView 9 1.0 5.3831e-0112.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 3 0 0 0 0 4 0
Mesh Partition 6 1.0 1.3552e+01 1.0 0.00e+00 0.0 1.0e+05 5.9e+04 3.3e+01 12 0 41 7 13 13 0 52 7 13 0
Mesh Migration 6 1.0 1.8341e+01 1.0 0.00e+00 0.0 7.5e+04 1.0e+06 7.2e+01 16 0 31 85 27 17 0 39 86 28 0
DMPlexInterp 3 1.0 1.3771e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 12 0 0 0 2 13 0 0 0 2 0
DMPlexDistribute 3 1.0 1.0266e+01 1.0 0.00e+00 0.0 4.9e+04 5.9e+04 2.7e+01 9 0 20 3 10 10 0 25 3 11 0
DMPlexDistCones 6 1.0 6.9775e+00 1.5 0.00e+00 0.0 1.2e+04 2.3e+06 0.0e+00 5 0 5 32 0 6 0 6 32 0 0
DMPlexDistLabels 6 1.0 7.9111e+00 1.0 0.00e+00 0.0 4.0e+04 9.8e+05 6.0e+00 7 0 16 43 2 7 0 21 44 2 0
DMPlexDistribOL 3 1.0 2.2335e+01 1.0 0.00e+00 0.0 1.3e+05 6.6e+05 7.8e+01 19 0 53 94 30 21 0 66 95 30 0
DMPlexDistField 9 1.0 7.2773e-01 1.0 0.00e+00 0.0 1.7e+04 2.0e+05 6.0e+00 1 0 7 4 2 1 0 9 4 2 0
DMPlexDistData 6 1.0 8.0047e+00 9.4 0.00e+00 0.0 8.6e+04 1.2e+04 0.0e+00 6 0 35 1 0 6 0 45 1 0 0
DMPlexStratify 19 1.0 1.8531e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 15 0 0 0 7 16 0 0 0 7 0
SFSetGraph 141 1.0 2.2412e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
SFBcastBegin 271 1.0 1.1975e+01 2.0 0.00e+00 0.0 1.8e+05 4.8e+05 0.0e+00 9 0 75 98 0 10 0 95100 0 0
SFBcastEnd 271 1.0 6.4306e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
SFReduceBegin 12 1.0 1.7538e-0112.8 0.00e+00 0.0 4.8e+03 5.9e+04 0.0e+00 0 0 2 0 0 0 0 2 0 0 0
SFReduceEnd 12 1.0 2.2638e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFFetchOpBegin 3 1.0 9.9087e-0415.6 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFFetchOpEnd 3 1.0 3.6049e-02 6.4 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
CreateMesh 15 1.0 4.9047e+01 1.0 0.00e+00 0.0 1.8e+05 4.9e+05 1.2e+02 42 0 73 97 44 45 0 92 98 46 0
CreateFunctionSpace 3 1.0 4.2819e+01 1.0 0.00e+00 0.0 1.4e+05 6.3e+05 1.2e+02 37 0 56 95 44 40 0 71 97 45 0
Mesh: reorder 3 1.0 1.5455e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 1 0 0 0 2 1 0 0 0 2 0
Mesh: numbering 3 1.0 1.0627e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 9 0 0 0 2 10 0 0 0 2 0
CreateSparsity 3 1.0 2.0243e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatZeroInitial 3 1.0 2.7938e+00 1.0 0.00e+00 0.0 2.5e+03 1.3e+04 2.7e+01 2 0 1 0 10 3 0 1 0 11 0
ParLoopExecute 6 1.0 3.1709e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00 3 47 0 0 0 3100 0 0 0 175069
ParLoopset_4 2 1.0 1.1100e-0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopHaloEnd 6 1.0 2.9564e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednBegin 6 1.0 7.0810e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednEnd 6 1.0 6.5088e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopCells 9 1.0 2.9736e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00 2 47 0 0 0 3100 0 0 0 186686
ParLoopset_10 2 1.0 1.1411e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopset_16 2 1.0 1.1880e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: P(1) aij matrix
VecScatterBegin 40 1.0 1.1312e-02 8.5 0.00e+00 0.0 1.7e+04 1.4e+04 0.0e+00 0 0 7 0 0 0 0100100 0 0
VecScatterEnd 40 1.0 2.6442e-0161.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 0 0 0 0 0
VecSetRandom 40 1.0 4.4251e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 27 0 0 0 0 0
MatMult 40 1.0 4.5745e-01 1.1 2.06e+08 1.1 1.7e+04 1.4e+04 0.0e+00 0 1 7 0 0 28 17100100 0 20423
MatAssemblyBegin 3 1.0 2.3842e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 1.8371e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatZeroEntries 1 1.0 5.2531e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 2 1.0 1.8248e-012468.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 5 0 0 0100 0
AssembleMat 1 1.0 7.0037e-01 1.0 1.01e+09 1.1 0.0e+00 0.0e+00 2.0e+00 1 4 0 0 1 45 83 0 0100 66009
ParLoopExecute 1 1.0 6.7369e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 38 83 0 0 0 68623
ParLoopHaloEnd 1 1.0 1.3113e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednBegin 1 1.0 1.3113e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednEnd 1 1.0 1.0967e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopCells 3 1.0 6.7352e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 38 83 0 0 0 68641
--- Event Stage 2: P(2) aij matrix
VecScatterBegin 40 1.0 1.2448e-02 6.3 0.00e+00 0.0 1.7e+04 2.1e+04 0.0e+00 0 0 7 0 0 0 0100100 0 0
VecScatterEnd 40 1.0 4.3488e-0156.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 14 0 0 0 0 0
VecSetRandom 40 1.0 4.4287e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 22 0 0 0 0 0
MatMult 40 1.0 7.8413e-01 1.1 4.04e+08 1.1 1.7e+04 2.1e+04 0.0e+00 1 2 7 0 0 39 21100100 0 23192
MatAssemblyBegin 3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 2.4675e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatZeroEntries 1 1.0 9.4781e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 2 1.0 1.4482e-01344.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 3 0 0 0100 0
AssembleMat 1 1.0 7.5959e-01 1.0 1.57e+09 1.2 0.0e+00 0.0e+00 2.0e+00 1 6 0 0 1 39 79 0 0100 92192
ParLoopExecute 1 1.0 7.1835e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 34 79 0 0 0 97484
ParLoopHaloEnd 1 1.0 1.1921e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednBegin 1 1.0 1.7881e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopCells 3 1.0 7.1820e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 34 79 0 0 0 97505
--- Event Stage 3: P(3) aij matrix
VecScatterBegin 40 1.0 2.3520e-0210.9 0.00e+00 0.0 1.6e+04 4.2e+04 0.0e+00 0 0 7 1 0 0 0100100 0 0
VecScatterEnd 40 1.0 6.6521e-0138.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0
VecSetRandom 40 1.0 7.5565e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 16 0 0 0 0 0
MatMult 40 1.0 2.0981e+00 1.0 1.19e+09 1.1 1.6e+04 4.2e+04 0.0e+00 2 4 7 1 0 46 11100100 0 25439
MatAssemblyBegin 3 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 5.6094e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatZeroEntries 1 1.0 2.9610e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatView 2 1.0 2.8071e-01958.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 3 0 0 0100 0
AssembleMat 1 1.0 1.7038e+00 1.0 9.94e+09 1.2 0.0e+00 0.0e+00 2.0e+00 1 37 0 0 1 38 89 0 0100 257591
ParLoopExecute 1 1.0 1.6101e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 32 89 0 0 0 272582
ParLoopHaloEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednBegin 1 1.0 1.7166e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopRednEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopCells 3 1.0 1.6099e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 32 89 0 0 0 272617
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 12 11 6776 0.
Viewer 4 0 0 0.
PetscRandom 3 3 2058 0.
Index Set 1095 1085 1383616 0.
IS L to G Mapping 15 14 204830392 0.
Section 222 209 158840 0.
Vector 31 28 41441632 0.
Vector Scatter 3 2 2416 0.
Matrix 22 18 131705576 0.
Distributed Mesh 40 37 182200 0.
GraphPartitioner 19 18 11808 0.
Star Forest Bipartite Graph 206 200 178256 0.
Discrete System 40 37 34336 0.
--- Event Stage 1: P(1) aij matrix
PetscRandom 40 40 27440 0.
--- Event Stage 2: P(2) aij matrix
PetscRandom 40 40 27440 0.
--- Event Stage 3: P(3) aij matrix
PetscRandom 40 40 27440 0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 1.06335e-05
Average time for zero size MPI_Send(): 1.41561e-06
#PETSc Option Table entries:
--dimension 3
--output-file poisson-matvecs.csv
--problem poisson
-log_view
-mat_view ::ascii_info
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --COPTFLAGS="-march=ivybridge -O3" --CXXOPTFLAGS="-march=ivybridge -O3" --FOPTFLAGS="-march=ivybridge -O3" --PETSC_ARCH=petsc-gnu51-ivybridge-int64 --download-exodusii --download-hypre --download-metis --download-netcdf --download-parmetis --download-sowing=1 --known-bits-per-byte=8 --known-has-attribute-aligned=1 --known-level1-dcache-assoc=8 --known-level1-dcache-linesize=64 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=1 --known-sdot-returns-double=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --known-snrm2-returns-double=0 --prefix=/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64 --with-64-bit-indices=1 --with-batch=1 --with-blas-lapack-lib="-L/opt/cray/libsci/16.03.1/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.14/GNU/5.1 --with-hdf5=1 --with-make-np=4 --with-pic=1 --with-shared-libraries=1 --with-x=0 --download-eigen
-----------------------------------------
Libraries compiled on Tue Feb 14 12:07:09 2017 on eslogin003
Machine characteristics: Linux-3.0.101-0.47.86.1.11753.0.PTF-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /home2/n01/n01/lmn01/src/petsc
Using PETSc arch: petsc-gnu51-ivybridge-int64
-----------------------------------------
Using C compiler: cc -fPIC -march=ivybridge -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -fPIC -march=ivybridge -O3 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include/eigen3 -I/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -L/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -lpetsc -Wl,-rpath,/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -L/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -lHYPRE -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -L/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl -lcrypto -ldl
-----------------------------------------
Application 28632506 resources: utime ~4100s, stime ~428s, Rss ~2685552, inblocks ~2935062, outblocks ~42464
--------------------------------------------------------------------------------
Resources requested: ncpus=48,place=free,walltime=00:20:00
Resources allocated: cpupercent=0,cput=00:00:02,mem=8980kb,ncpus=48,vmem=172968kb,walltime=00:02:20
*** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
*** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
*** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
*** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
--------------------------------------------------------------------------------
More information about the petsc-users
mailing list