[petsc-users] Understanding matmult memory performance
Karl Rupp
rupp at iue.tuwien.ac.at
Fri Sep 29 09:08:18 CDT 2017
Hi Lawrence,
according to
https://ark.intel.com/products/75283/Intel-Xeon-Processor-E5-2697-v2-30M-Cache-2_70-GHz
you get 59.7 GB/sec of peak memory bandwidth per CPU, so you should get
about 240 GB/sec for your two-node system.
If you use PETSc's `make streams`, then processor placement may -
unfortunately - not be ideal and hence underestimating the achievable
performance. Have a look at the new PETSc 3.8 manual [1], Chapter 14,
where Richard and I nailed down some of these performance aspects.
Best regards,
Karli
[1] http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf
On 09/29/2017 06:19 AM, Lawrence Mitchell wrote:
> Dear all,
>
> I'm attempting to understand some results I'm getting for matmult performance. In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible.
>
> The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon E5-2697 v2). The specced main memory bandwidth is 85.3 GB/s per node, and I measure a STREAM triad bandwidth using 48 MPI processes (two nodes) of 148.2 GB/s. The last level cache is 30MB (shared between 12 cores)
>
> The matrix I'm using is respectively a P1, P2, and P3 discretisation of the Laplacian on a regular tetrahedral grid.
>
> The matrix sizes are respectively:
>
> P1:
> Mat Object: 48 MPI processes
> type: mpiaij
> rows=8120601, cols=8120601
> total: nonzeros=120841801, allocated nonzeros=120841801
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
>
> P2:
> Mat Object: 48 MPI processes
> type: mpiaij
> rows=8120601, cols=8120601
> total: nonzeros=231382401, allocated nonzeros=231382401
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
>
> P3:
> Mat Object: 48 MPI processes
> type: mpiaij
> rows=13997521, cols=13997521
> total: nonzeros=674173201, allocated nonzeros=674173201
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
>
> Both sizeof(PetscScalar) and sizeof(PetscInt) are 8 bytes.
>
> Ignoring data for vector and row indices, then, for a matmult I need to move 16*nonzeros bytes.
>
> MatMults take, respectively:
>
> P1: 0.0114362s
> P2: 0.0196032s
> P3: 0.0524525s
>
> So the estimated achieved memory bandwidth is:
>
> P1: 120841801 * 16 / 0.0114362 = 157.45GB/s
> P2: 231382401 * 16 / 0.0196032 = 175.88GB/s
> P3: 674173201 * 16 / 0.0524525 = 191.52GB/s
>
> So all of those numbers are higher than the stream bandwidth, and the P2 and P3 numbers are higher than the spec sheet bandwidth.
>
> I don't think PETSc is doing anything magic, but hints appreciated, it would be nice to explain this.
>
> Cheers,
>
> Lawrence
>
> Full -log_view output:
>
> --------------------------------------------------------------------------------
> *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
>
> --------------------------------------------------------------------------------
> Int Type has 8 bytes, Scalar Type has 8 bytes
>
> P1:
> Mat Object: 48 MPI processes
> type: mpiaij
> rows=8120601, cols=8120601
> total: nonzeros=120841801, allocated nonzeros=120841801
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
> P2:
> Mat Object: 48 MPI processes
> type: mpiaij
> rows=8120601, cols=8120601
> total: nonzeros=231382401, allocated nonzeros=231382401
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
> P3:
> Mat Object: 48 MPI processes
> type: mpiaij
> rows=13997521, cols=13997521
> total: nonzeros=674173201, allocated nonzeros=674173201
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> profile-matvec.py on a petsc-gnu51-ivybridge-int64 named nid00013 with 48 processors, by lmn01 Fri Sep 29 11:58:21 2017
> Using Petsc Development GIT revision: v3.7.5-3014-g413f72f GIT Date: 2017-02-05 17:50:57 -0600
>
> Max Max/Min Avg Total
> Time (sec): 1.150e+02 1.00000 1.150e+02
> Objects: 1.832e+03 1.50534 1.269e+03
> Flops: 2.652e+10 1.16244 2.486e+10 1.193e+12
> Flops/sec: 2.306e+08 1.16244 2.162e+08 1.038e+10
> MPI Messages: 1.021e+04 3.00279 5.091e+03 2.444e+05
> MPI Message Lengths: 3.314e+09 1.97310 3.697e+05 9.035e+10
> MPI Reductions: 2.630e+02 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 1.0701e+02 93.1% 5.5715e+11 46.7% 1.942e+05 79.4% 3.644e+05 98.6% 2.560e+02 97.3%
> 1: P(1) aij matrix: 1.5561e+00 1.4% 5.5574e+10 4.7% 1.688e+04 6.9% 9.789e+02 0.3% 2.000e+00 0.8%
> 2: P(2) aij matrix: 1.9378e+00 1.7% 8.8214e+10 7.4% 1.688e+04 6.9% 1.483e+03 0.4% 2.000e+00 0.8%
> 3: P(3) aij matrix: 4.4890e+00 3.9% 4.9225e+11 41.3% 1.648e+04 6.7% 2.829e+03 0.8% 2.000e+00 0.8%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> PetscBarrier 4 1.0 2.7271e+00 1.0 0.00e+00 0.0 3.8e+03 2.4e+01 2.0e+01 2 0 2 0 8 3 0 2 0 8 0
> BuildTwoSided 124 1.0 9.0858e+00 7.2 0.00e+00 0.0 2.7e+04 8.0e+00 0.0e+00 6 0 11 0 0 7 0 14 0 0 0
> VecSet 16 1.0 5.8370e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 3 1.0 2.1945e-0269.7 0.00e+00 0.0 1.3e+03 2.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 0
> VecScatterEnd 3 1.0 2.2460e-0218.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSetRandom 3 1.0 4.0847e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatMult 3 1.0 9.4907e-02 1.2 4.50e+07 1.1 1.3e+03 2.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 21311
> MatAssemblyBegin 12 1.0 2.6438e-03235.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 12 1.0 6.6632e-01 2.5 0.00e+00 0.0 2.5e+03 1.3e+04 2.4e+01 0 0 1 0 9 0 0 1 0 9 0
> MatView 9 1.0 5.3831e-0112.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 3 0 0 0 0 4 0
> Mesh Partition 6 1.0 1.3552e+01 1.0 0.00e+00 0.0 1.0e+05 5.9e+04 3.3e+01 12 0 41 7 13 13 0 52 7 13 0
> Mesh Migration 6 1.0 1.8341e+01 1.0 0.00e+00 0.0 7.5e+04 1.0e+06 7.2e+01 16 0 31 85 27 17 0 39 86 28 0
> DMPlexInterp 3 1.0 1.3771e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 12 0 0 0 2 13 0 0 0 2 0
> DMPlexDistribute 3 1.0 1.0266e+01 1.0 0.00e+00 0.0 4.9e+04 5.9e+04 2.7e+01 9 0 20 3 10 10 0 25 3 11 0
> DMPlexDistCones 6 1.0 6.9775e+00 1.5 0.00e+00 0.0 1.2e+04 2.3e+06 0.0e+00 5 0 5 32 0 6 0 6 32 0 0
> DMPlexDistLabels 6 1.0 7.9111e+00 1.0 0.00e+00 0.0 4.0e+04 9.8e+05 6.0e+00 7 0 16 43 2 7 0 21 44 2 0
> DMPlexDistribOL 3 1.0 2.2335e+01 1.0 0.00e+00 0.0 1.3e+05 6.6e+05 7.8e+01 19 0 53 94 30 21 0 66 95 30 0
> DMPlexDistField 9 1.0 7.2773e-01 1.0 0.00e+00 0.0 1.7e+04 2.0e+05 6.0e+00 1 0 7 4 2 1 0 9 4 2 0
> DMPlexDistData 6 1.0 8.0047e+00 9.4 0.00e+00 0.0 8.6e+04 1.2e+04 0.0e+00 6 0 35 1 0 6 0 45 1 0 0
> DMPlexStratify 19 1.0 1.8531e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 15 0 0 0 7 16 0 0 0 7 0
> SFSetGraph 141 1.0 2.2412e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> SFBcastBegin 271 1.0 1.1975e+01 2.0 0.00e+00 0.0 1.8e+05 4.8e+05 0.0e+00 9 0 75 98 0 10 0 95100 0 0
> SFBcastEnd 271 1.0 6.4306e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
> SFReduceBegin 12 1.0 1.7538e-0112.8 0.00e+00 0.0 4.8e+03 5.9e+04 0.0e+00 0 0 2 0 0 0 0 2 0 0 0
> SFReduceEnd 12 1.0 2.2638e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> SFFetchOpBegin 3 1.0 9.9087e-0415.6 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> SFFetchOpEnd 3 1.0 3.6049e-02 6.4 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> CreateMesh 15 1.0 4.9047e+01 1.0 0.00e+00 0.0 1.8e+05 4.9e+05 1.2e+02 42 0 73 97 44 45 0 92 98 46 0
> CreateFunctionSpace 3 1.0 4.2819e+01 1.0 0.00e+00 0.0 1.4e+05 6.3e+05 1.2e+02 37 0 56 95 44 40 0 71 97 45 0
> Mesh: reorder 3 1.0 1.5455e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 1 0 0 0 2 1 0 0 0 2 0
> Mesh: numbering 3 1.0 1.0627e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 9 0 0 0 2 10 0 0 0 2 0
> CreateSparsity 3 1.0 2.0243e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> MatZeroInitial 3 1.0 2.7938e+00 1.0 0.00e+00 0.0 2.5e+03 1.3e+04 2.7e+01 2 0 1 0 10 3 0 1 0 11 0
> ParLoopExecute 6 1.0 3.1709e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00 3 47 0 0 0 3100 0 0 0 175069
> ParLoopset_4 2 1.0 1.1100e-0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopHaloEnd 6 1.0 2.9564e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednBegin 6 1.0 7.0810e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednEnd 6 1.0 6.5088e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopCells 9 1.0 2.9736e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00 2 47 0 0 0 3100 0 0 0 186686
> ParLoopset_10 2 1.0 1.1411e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopset_16 2 1.0 1.1880e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> --- Event Stage 1: P(1) aij matrix
>
> VecScatterBegin 40 1.0 1.1312e-02 8.5 0.00e+00 0.0 1.7e+04 1.4e+04 0.0e+00 0 0 7 0 0 0 0100100 0 0
> VecScatterEnd 40 1.0 2.6442e-0161.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 0 0 0 0 0
> VecSetRandom 40 1.0 4.4251e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 27 0 0 0 0 0
> MatMult 40 1.0 4.5745e-01 1.1 2.06e+08 1.1 1.7e+04 1.4e+04 0.0e+00 0 1 7 0 0 28 17100100 0 20423
> MatAssemblyBegin 3 1.0 2.3842e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 3 1.0 1.8371e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> MatZeroEntries 1 1.0 5.2531e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatView 2 1.0 1.8248e-012468.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 5 0 0 0100 0
> AssembleMat 1 1.0 7.0037e-01 1.0 1.01e+09 1.1 0.0e+00 0.0e+00 2.0e+00 1 4 0 0 1 45 83 0 0100 66009
> ParLoopExecute 1 1.0 6.7369e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 38 83 0 0 0 68623
> ParLoopHaloEnd 1 1.0 1.3113e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednBegin 1 1.0 1.3113e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednEnd 1 1.0 1.0967e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopCells 3 1.0 6.7352e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 38 83 0 0 0 68641
>
> --- Event Stage 2: P(2) aij matrix
>
> VecScatterBegin 40 1.0 1.2448e-02 6.3 0.00e+00 0.0 1.7e+04 2.1e+04 0.0e+00 0 0 7 0 0 0 0100100 0 0
> VecScatterEnd 40 1.0 4.3488e-0156.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 14 0 0 0 0 0
> VecSetRandom 40 1.0 4.4287e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 22 0 0 0 0 0
> MatMult 40 1.0 7.8413e-01 1.1 4.04e+08 1.1 1.7e+04 2.1e+04 0.0e+00 1 2 7 0 0 39 21100100 0 23192
> MatAssemblyBegin 3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 3 1.0 2.4675e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> MatZeroEntries 1 1.0 9.4781e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatView 2 1.0 1.4482e-01344.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 3 0 0 0100 0
> AssembleMat 1 1.0 7.5959e-01 1.0 1.57e+09 1.2 0.0e+00 0.0e+00 2.0e+00 1 6 0 0 1 39 79 0 0100 92192
> ParLoopExecute 1 1.0 7.1835e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 34 79 0 0 0 97484
> ParLoopHaloEnd 1 1.0 1.1921e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednBegin 1 1.0 1.7881e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopCells 3 1.0 7.1820e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 34 79 0 0 0 97505
>
> --- Event Stage 3: P(3) aij matrix
>
> VecScatterBegin 40 1.0 2.3520e-0210.9 0.00e+00 0.0 1.6e+04 4.2e+04 0.0e+00 0 0 7 1 0 0 0100100 0 0
> VecScatterEnd 40 1.0 6.6521e-0138.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0
> VecSetRandom 40 1.0 7.5565e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 16 0 0 0 0 0
> MatMult 40 1.0 2.0981e+00 1.0 1.19e+09 1.1 1.6e+04 4.2e+04 0.0e+00 2 4 7 1 0 46 11100100 0 25439
> MatAssemblyBegin 3 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 3 1.0 5.6094e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> MatZeroEntries 1 1.0 2.9610e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> MatView 2 1.0 2.8071e-01958.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 3 0 0 0100 0
> AssembleMat 1 1.0 1.7038e+00 1.0 9.94e+09 1.2 0.0e+00 0.0e+00 2.0e+00 1 37 0 0 1 38 89 0 0100 257591
> ParLoopExecute 1 1.0 1.6101e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 32 89 0 0 0 272582
> ParLoopHaloEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednBegin 1 1.0 1.7166e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopRednEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ParLoopCells 3 1.0 1.6099e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 32 89 0 0 0 272617
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Container 12 11 6776 0.
> Viewer 4 0 0 0.
> PetscRandom 3 3 2058 0.
> Index Set 1095 1085 1383616 0.
> IS L to G Mapping 15 14 204830392 0.
> Section 222 209 158840 0.
> Vector 31 28 41441632 0.
> Vector Scatter 3 2 2416 0.
> Matrix 22 18 131705576 0.
> Distributed Mesh 40 37 182200 0.
> GraphPartitioner 19 18 11808 0.
> Star Forest Bipartite Graph 206 200 178256 0.
> Discrete System 40 37 34336 0.
>
> --- Event Stage 1: P(1) aij matrix
>
> PetscRandom 40 40 27440 0.
>
> --- Event Stage 2: P(2) aij matrix
>
> PetscRandom 40 40 27440 0.
>
> --- Event Stage 3: P(3) aij matrix
>
> PetscRandom 40 40 27440 0.
> ========================================================================================================================
> Average time to get PetscTime(): 0.
> Average time for MPI_Barrier(): 1.06335e-05
> Average time for zero size MPI_Send(): 1.41561e-06
> #PETSc Option Table entries:
> --dimension 3
> --output-file poisson-matvecs.csv
> --problem poisson
> -log_view
> -mat_view ::ascii_info
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> Configure options: --COPTFLAGS="-march=ivybridge -O3" --CXXOPTFLAGS="-march=ivybridge -O3" --FOPTFLAGS="-march=ivybridge -O3" --PETSC_ARCH=petsc-gnu51-ivybridge-int64 --download-exodusii --download-hypre --download-metis --download-netcdf --download-parmetis --download-sowing=1 --known-bits-per-byte=8 --known-has-attribute-aligned=1 --known-level1-dcache-assoc=8 --known-level1-dcache-linesize=64 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=1 --known-sdot-returns-double=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --known-snrm2-returns-double=0 --prefix=/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64 --with-64-bit-indices=1 --with-batch=1 --with-blas-lapack-lib="-L/opt/cray/libsci/16.03.1/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.14/GNU/5.1 --with-hdf5=1 --with-make-np=4 --with-pic=1 --with-shared-libraries=1 --with-x=0 --download-eigen
> -----------------------------------------
> Libraries compiled on Tue Feb 14 12:07:09 2017 on eslogin003
> Machine characteristics: Linux-3.0.101-0.47.86.1.11753.0.PTF-default-x86_64-with-SuSE-11-x86_64
> Using PETSc directory: /home2/n01/n01/lmn01/src/petsc
> Using PETSc arch: petsc-gnu51-ivybridge-int64
> -----------------------------------------
>
> Using C compiler: cc -fPIC -march=ivybridge -O3 ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: ftn -fPIC -march=ivybridge -O3 ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include/eigen3 -I/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/include
> -----------------------------------------
>
> Using C linker: cc
> Using Fortran linker: ftn
> Using libraries: -Wl,-rpath,/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -L/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -lpetsc -Wl,-rpath,/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -L/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -lHYPRE -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -L/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl -lcrypto -ldl
> -----------------------------------------
>
> Application 28632506 resources: utime ~4100s, stime ~428s, Rss ~2685552, inblocks ~2935062, outblocks ~42464
> --------------------------------------------------------------------------------
>
> Resources requested: ncpus=48,place=free,walltime=00:20:00
> Resources allocated: cpupercent=0,cput=00:00:02,mem=8980kb,ncpus=48,vmem=172968kb,walltime=00:02:20
>
> *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> --------------------------------------------------------------------------------
>
More information about the petsc-users
mailing list