[petsc-users] Understanding matmult memory performance
Tobin Isaac
tisaac at cc.gatech.edu
Fri Sep 29 09:05:01 CDT 2017
On Fri, Sep 29, 2017 at 09:04:47AM -0400, Tobin Isaac wrote:
> On Fri, Sep 29, 2017 at 12:19:54PM +0100, Lawrence Mitchell wrote:
> > Dear all,
> >
> > I'm attempting to understand some results I'm getting for matmult performance. In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible.
> >
> > The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon E5-2697 v2). The specced main memory bandwidth is 85.3 GB/s per node, and I measure a STREAM triad bandwidth using 48 MPI processes (two nodes) of 148.2 GB/s. The last level cache is 30MB (shared between 12 cores)
>
> One thought: triad has a 1:2 write:read ratio, but with your MatMult()
> for P3 you would have about 1:50. Unless triad used nontemporal
> stores, the reported bandwidth from triad will be about 3./4. of the
> bandwidth available to pure streaming reads, so maybe you actually
> have ~197 GB/s of read bandwidth available. MatMult() would still be
> doing suspiciously well, but it would be within the measurements. How
> confident are you in the specced bandwidth?
Are you running on archer? I found one site [1] that lists the
bandwidth you gave, which corresponds to DDR3-1333, but other sites
[2] all say the nodes have DDR3-1833, in which case you would be
getting about 80% of spec bandwidth.
[1]: https://www.archer.ac.uk/documentation/best-practice-guide/arch.php
[2]: https://www.epcc.ed.ac.uk/blog/2013/11/20/archer-next-national-hpc-service-academic-research
>
> Cheers,
> Toby
>
> >
> > The matrix I'm using is respectively a P1, P2, and P3 discretisation of the Laplacian on a regular tetrahedral grid.
> >
> > The matrix sizes are respectively:
> >
> > P1:
> > Mat Object: 48 MPI processes
> > type: mpiaij
> > rows=8120601, cols=8120601
> > total: nonzeros=120841801, allocated nonzeros=120841801
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> >
> > P2:
> > Mat Object: 48 MPI processes
> > type: mpiaij
> > rows=8120601, cols=8120601
> > total: nonzeros=231382401, allocated nonzeros=231382401
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> >
> > P3:
> > Mat Object: 48 MPI processes
> > type: mpiaij
> > rows=13997521, cols=13997521
> > total: nonzeros=674173201, allocated nonzeros=674173201
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> >
> > Both sizeof(PetscScalar) and sizeof(PetscInt) are 8 bytes.
> >
> > Ignoring data for vector and row indices, then, for a matmult I need to move 16*nonzeros bytes.
> >
> > MatMults take, respectively:
> >
> > P1: 0.0114362s
> > P2: 0.0196032s
> > P3: 0.0524525s
> >
> > So the estimated achieved memory bandwidth is:
> >
> > P1: 120841801 * 16 / 0.0114362 = 157.45GB/s
> > P2: 231382401 * 16 / 0.0196032 = 175.88GB/s
> > P3: 674173201 * 16 / 0.0524525 = 191.52GB/s
> >
> > So all of those numbers are higher than the stream bandwidth, and the P2 and P3 numbers are higher than the spec sheet bandwidth.
> >
> > I don't think PETSc is doing anything magic, but hints appreciated, it would be nice to explain this.
> >
> > Cheers,
> >
> > Lawrence
> >
> > Full -log_view output:
> >
> > --------------------------------------------------------------------------------
> > *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> > *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> > *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> > *** lmn01 Job: 4820277.sdb started: 29/09/17 11:56:03 host: mom1 ***
> >
> > --------------------------------------------------------------------------------
> > Int Type has 8 bytes, Scalar Type has 8 bytes
> >
> > P1:
> > Mat Object: 48 MPI processes
> > type: mpiaij
> > rows=8120601, cols=8120601
> > total: nonzeros=120841801, allocated nonzeros=120841801
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> > P2:
> > Mat Object: 48 MPI processes
> > type: mpiaij
> > rows=8120601, cols=8120601
> > total: nonzeros=231382401, allocated nonzeros=231382401
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> > P3:
> > Mat Object: 48 MPI processes
> > type: mpiaij
> > rows=13997521, cols=13997521
> > total: nonzeros=674173201, allocated nonzeros=674173201
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> > ************************************************************************************************************************
> > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> > ************************************************************************************************************************
> >
> > ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> >
> > profile-matvec.py on a petsc-gnu51-ivybridge-int64 named nid00013 with 48 processors, by lmn01 Fri Sep 29 11:58:21 2017
> > Using Petsc Development GIT revision: v3.7.5-3014-g413f72f GIT Date: 2017-02-05 17:50:57 -0600
> >
> > Max Max/Min Avg Total
> > Time (sec): 1.150e+02 1.00000 1.150e+02
> > Objects: 1.832e+03 1.50534 1.269e+03
> > Flops: 2.652e+10 1.16244 2.486e+10 1.193e+12
> > Flops/sec: 2.306e+08 1.16244 2.162e+08 1.038e+10
> > MPI Messages: 1.021e+04 3.00279 5.091e+03 2.444e+05
> > MPI Message Lengths: 3.314e+09 1.97310 3.697e+05 9.035e+10
> > MPI Reductions: 2.630e+02 1.00000
> >
> > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> > e.g., VecAXPY() for real vectors of length N --> 2N flops
> > and VecAXPY() for complex vectors of length N --> 8N flops
> >
> > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> > Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> > 0: Main Stage: 1.0701e+02 93.1% 5.5715e+11 46.7% 1.942e+05 79.4% 3.644e+05 98.6% 2.560e+02 97.3%
> > 1: P(1) aij matrix: 1.5561e+00 1.4% 5.5574e+10 4.7% 1.688e+04 6.9% 9.789e+02 0.3% 2.000e+00 0.8%
> > 2: P(2) aij matrix: 1.9378e+00 1.7% 8.8214e+10 7.4% 1.688e+04 6.9% 1.483e+03 0.4% 2.000e+00 0.8%
> > 3: P(3) aij matrix: 4.4890e+00 3.9% 4.9225e+11 41.3% 1.648e+04 6.7% 2.829e+03 0.8% 2.000e+00 0.8%
> >
> > ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on interpreting output.
> > Phase summary info:
> > Count: number of times phase was executed
> > Time and Flops: Max - maximum over all processors
> > Ratio - ratio of maximum to minimum over all processors
> > Mess: number of messages sent
> > Avg. len: average message length (bytes)
> > Reduct: number of global reductions
> > Global: entire computation
> > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> > %T - percent time in this phase %F - percent flops in this phase
> > %M - percent messages in this phase %L - percent message lengths in this phase
> > %R - percent reductions in this phase
> > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> > ------------------------------------------------------------------------------------------------------------------------
> > Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> > Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > --- Event Stage 0: Main Stage
> >
> > PetscBarrier 4 1.0 2.7271e+00 1.0 0.00e+00 0.0 3.8e+03 2.4e+01 2.0e+01 2 0 2 0 8 3 0 2 0 8 0
> > BuildTwoSided 124 1.0 9.0858e+00 7.2 0.00e+00 0.0 2.7e+04 8.0e+00 0.0e+00 6 0 11 0 0 7 0 14 0 0 0
> > VecSet 16 1.0 5.8370e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > VecScatterBegin 3 1.0 2.1945e-0269.7 0.00e+00 0.0 1.3e+03 2.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 0
> > VecScatterEnd 3 1.0 2.2460e-0218.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > VecSetRandom 3 1.0 4.0847e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatMult 3 1.0 9.4907e-02 1.2 4.50e+07 1.1 1.3e+03 2.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 21311
> > MatAssemblyBegin 12 1.0 2.6438e-03235.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatAssemblyEnd 12 1.0 6.6632e-01 2.5 0.00e+00 0.0 2.5e+03 1.3e+04 2.4e+01 0 0 1 0 9 0 0 1 0 9 0
> > MatView 9 1.0 5.3831e-0112.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 3 0 0 0 0 4 0
> > Mesh Partition 6 1.0 1.3552e+01 1.0 0.00e+00 0.0 1.0e+05 5.9e+04 3.3e+01 12 0 41 7 13 13 0 52 7 13 0
> > Mesh Migration 6 1.0 1.8341e+01 1.0 0.00e+00 0.0 7.5e+04 1.0e+06 7.2e+01 16 0 31 85 27 17 0 39 86 28 0
> > DMPlexInterp 3 1.0 1.3771e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 12 0 0 0 2 13 0 0 0 2 0
> > DMPlexDistribute 3 1.0 1.0266e+01 1.0 0.00e+00 0.0 4.9e+04 5.9e+04 2.7e+01 9 0 20 3 10 10 0 25 3 11 0
> > DMPlexDistCones 6 1.0 6.9775e+00 1.5 0.00e+00 0.0 1.2e+04 2.3e+06 0.0e+00 5 0 5 32 0 6 0 6 32 0 0
> > DMPlexDistLabels 6 1.0 7.9111e+00 1.0 0.00e+00 0.0 4.0e+04 9.8e+05 6.0e+00 7 0 16 43 2 7 0 21 44 2 0
> > DMPlexDistribOL 3 1.0 2.2335e+01 1.0 0.00e+00 0.0 1.3e+05 6.6e+05 7.8e+01 19 0 53 94 30 21 0 66 95 30 0
> > DMPlexDistField 9 1.0 7.2773e-01 1.0 0.00e+00 0.0 1.7e+04 2.0e+05 6.0e+00 1 0 7 4 2 1 0 9 4 2 0
> > DMPlexDistData 6 1.0 8.0047e+00 9.4 0.00e+00 0.0 8.6e+04 1.2e+04 0.0e+00 6 0 35 1 0 6 0 45 1 0 0
> > DMPlexStratify 19 1.0 1.8531e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 15 0 0 0 7 16 0 0 0 7 0
> > SFSetGraph 141 1.0 2.2412e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> > SFBcastBegin 271 1.0 1.1975e+01 2.0 0.00e+00 0.0 1.8e+05 4.8e+05 0.0e+00 9 0 75 98 0 10 0 95100 0 0
> > SFBcastEnd 271 1.0 6.4306e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
> > SFReduceBegin 12 1.0 1.7538e-0112.8 0.00e+00 0.0 4.8e+03 5.9e+04 0.0e+00 0 0 2 0 0 0 0 2 0 0 0
> > SFReduceEnd 12 1.0 2.2638e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > SFFetchOpBegin 3 1.0 9.9087e-0415.6 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > SFFetchOpEnd 3 1.0 3.6049e-02 6.4 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > CreateMesh 15 1.0 4.9047e+01 1.0 0.00e+00 0.0 1.8e+05 4.9e+05 1.2e+02 42 0 73 97 44 45 0 92 98 46 0
> > CreateFunctionSpace 3 1.0 4.2819e+01 1.0 0.00e+00 0.0 1.4e+05 6.3e+05 1.2e+02 37 0 56 95 44 40 0 71 97 45 0
> > Mesh: reorder 3 1.0 1.5455e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 1 0 0 0 2 1 0 0 0 2 0
> > Mesh: numbering 3 1.0 1.0627e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 9 0 0 0 2 10 0 0 0 2 0
> > CreateSparsity 3 1.0 2.0243e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> > MatZeroInitial 3 1.0 2.7938e+00 1.0 0.00e+00 0.0 2.5e+03 1.3e+04 2.7e+01 2 0 1 0 10 3 0 1 0 11 0
> > ParLoopExecute 6 1.0 3.1709e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00 3 47 0 0 0 3100 0 0 0 175069
> > ParLoopset_4 2 1.0 1.1100e-0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopHaloEnd 6 1.0 2.9564e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednBegin 6 1.0 7.0810e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednEnd 6 1.0 6.5088e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopCells 9 1.0 2.9736e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00 2 47 0 0 0 3100 0 0 0 186686
> > ParLoopset_10 2 1.0 1.1411e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopset_16 2 1.0 1.1880e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >
> > --- Event Stage 1: P(1) aij matrix
> >
> > VecScatterBegin 40 1.0 1.1312e-02 8.5 0.00e+00 0.0 1.7e+04 1.4e+04 0.0e+00 0 0 7 0 0 0 0100100 0 0
> > VecScatterEnd 40 1.0 2.6442e-0161.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 0 0 0 0 0
> > VecSetRandom 40 1.0 4.4251e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 27 0 0 0 0 0
> > MatMult 40 1.0 4.5745e-01 1.1 2.06e+08 1.1 1.7e+04 1.4e+04 0.0e+00 0 1 7 0 0 28 17100100 0 20423
> > MatAssemblyBegin 3 1.0 2.3842e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatAssemblyEnd 3 1.0 1.8371e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> > MatZeroEntries 1 1.0 5.2531e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatView 2 1.0 1.8248e-012468.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 5 0 0 0100 0
> > AssembleMat 1 1.0 7.0037e-01 1.0 1.01e+09 1.1 0.0e+00 0.0e+00 2.0e+00 1 4 0 0 1 45 83 0 0100 66009
> > ParLoopExecute 1 1.0 6.7369e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 38 83 0 0 0 68623
> > ParLoopHaloEnd 1 1.0 1.3113e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednBegin 1 1.0 1.3113e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednEnd 1 1.0 1.0967e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopCells 3 1.0 6.7352e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 38 83 0 0 0 68641
> >
> > --- Event Stage 2: P(2) aij matrix
> >
> > VecScatterBegin 40 1.0 1.2448e-02 6.3 0.00e+00 0.0 1.7e+04 2.1e+04 0.0e+00 0 0 7 0 0 0 0100100 0 0
> > VecScatterEnd 40 1.0 4.3488e-0156.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 14 0 0 0 0 0
> > VecSetRandom 40 1.0 4.4287e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 22 0 0 0 0 0
> > MatMult 40 1.0 7.8413e-01 1.1 4.04e+08 1.1 1.7e+04 2.1e+04 0.0e+00 1 2 7 0 0 39 21100100 0 23192
> > MatAssemblyBegin 3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatAssemblyEnd 3 1.0 2.4675e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> > MatZeroEntries 1 1.0 9.4781e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatView 2 1.0 1.4482e-01344.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 3 0 0 0100 0
> > AssembleMat 1 1.0 7.5959e-01 1.0 1.57e+09 1.2 0.0e+00 0.0e+00 2.0e+00 1 6 0 0 1 39 79 0 0100 92192
> > ParLoopExecute 1 1.0 7.1835e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 34 79 0 0 0 97484
> > ParLoopHaloEnd 1 1.0 1.1921e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednBegin 1 1.0 1.7881e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopCells 3 1.0 7.1820e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 34 79 0 0 0 97505
> >
> > --- Event Stage 3: P(3) aij matrix
> >
> > VecScatterBegin 40 1.0 2.3520e-0210.9 0.00e+00 0.0 1.6e+04 4.2e+04 0.0e+00 0 0 7 1 0 0 0100100 0 0
> > VecScatterEnd 40 1.0 6.6521e-0138.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0
> > VecSetRandom 40 1.0 7.5565e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 16 0 0 0 0 0
> > MatMult 40 1.0 2.0981e+00 1.0 1.19e+09 1.1 1.6e+04 4.2e+04 0.0e+00 2 4 7 1 0 46 11100100 0 25439
> > MatAssemblyBegin 3 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > MatAssemblyEnd 3 1.0 5.6094e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> > MatZeroEntries 1 1.0 2.9610e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
> > MatView 2 1.0 2.8071e-01958.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 3 0 0 0100 0
> > AssembleMat 1 1.0 1.7038e+00 1.0 9.94e+09 1.2 0.0e+00 0.0e+00 2.0e+00 1 37 0 0 1 38 89 0 0100 257591
> > ParLoopExecute 1 1.0 1.6101e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 32 89 0 0 0 272582
> > ParLoopHaloEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednBegin 1 1.0 1.7166e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopRednEnd 1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > ParLoopCells 3 1.0 1.6099e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 32 89 0 0 0 272617
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > Memory usage is given in bytes:
> >
> > Object Type Creations Destructions Memory Descendants' Mem.
> > Reports information only for process 0.
> >
> > --- Event Stage 0: Main Stage
> >
> > Container 12 11 6776 0.
> > Viewer 4 0 0 0.
> > PetscRandom 3 3 2058 0.
> > Index Set 1095 1085 1383616 0.
> > IS L to G Mapping 15 14 204830392 0.
> > Section 222 209 158840 0.
> > Vector 31 28 41441632 0.
> > Vector Scatter 3 2 2416 0.
> > Matrix 22 18 131705576 0.
> > Distributed Mesh 40 37 182200 0.
> > GraphPartitioner 19 18 11808 0.
> > Star Forest Bipartite Graph 206 200 178256 0.
> > Discrete System 40 37 34336 0.
> >
> > --- Event Stage 1: P(1) aij matrix
> >
> > PetscRandom 40 40 27440 0.
> >
> > --- Event Stage 2: P(2) aij matrix
> >
> > PetscRandom 40 40 27440 0.
> >
> > --- Event Stage 3: P(3) aij matrix
> >
> > PetscRandom 40 40 27440 0.
> > ========================================================================================================================
> > Average time to get PetscTime(): 0.
> > Average time for MPI_Barrier(): 1.06335e-05
> > Average time for zero size MPI_Send(): 1.41561e-06
> > #PETSc Option Table entries:
> > --dimension 3
> > --output-file poisson-matvecs.csv
> > --problem poisson
> > -log_view
> > -mat_view ::ascii_info
> > #End of PETSc Option Table entries
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> > Configure options: --COPTFLAGS="-march=ivybridge -O3" --CXXOPTFLAGS="-march=ivybridge -O3" --FOPTFLAGS="-march=ivybridge -O3" --PETSC_ARCH=petsc-gnu51-ivybridge-int64 --download-exodusii --download-hypre --download-metis --download-netcdf --download-parmetis --download-sowing=1 --known-bits-per-byte=8 --known-has-attribute-aligned=1 --known-level1-dcache-assoc=8 --known-level1-dcache-linesize=64 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=1 --known-sdot-returns-double=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --known-snrm2-returns-double=0 --prefix=/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64 --with-64-bit-indices=1 --with-batch=1 --with-blas-lapack-lib="-L/opt/cray/libsci/16.03.1/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.14/GNU/5.1 --with-hdf5=1 --with-make-np=4 --with-pic=1 --with-shared-libraries=1 --with-x=0 --download-eigen
> > -----------------------------------------
> > Libraries compiled on Tue Feb 14 12:07:09 2017 on eslogin003
> > Machine characteristics: Linux-3.0.101-0.47.86.1.11753.0.PTF-default-x86_64-with-SuSE-11-x86_64
> > Using PETSc directory: /home2/n01/n01/lmn01/src/petsc
> > Using PETSc arch: petsc-gnu51-ivybridge-int64
> > -----------------------------------------
> >
> > Using C compiler: cc -fPIC -march=ivybridge -O3 ${COPTFLAGS} ${CFLAGS}
> > Using Fortran compiler: ftn -fPIC -march=ivybridge -O3 ${FOPTFLAGS} ${FFLAGS}
> > -----------------------------------------
> >
> > Using include paths: -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include/eigen3 -I/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/include
> > -----------------------------------------
> >
> > Using C linker: cc
> > Using Fortran linker: ftn
> > Using libraries: -Wl,-rpath,/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -L/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -lpetsc -Wl,-rpath,/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -L/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -lHYPRE -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -L/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl -lcrypto -ldl
> > -----------------------------------------
> >
> > Application 28632506 resources: utime ~4100s, stime ~428s, Rss ~2685552, inblocks ~2935062, outblocks ~42464
> > --------------------------------------------------------------------------------
> >
> > Resources requested: ncpus=48,place=free,walltime=00:20:00
> > Resources allocated: cpupercent=0,cput=00:00:02,mem=8980kb,ncpus=48,vmem=172968kb,walltime=00:02:20
> >
> > *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> > *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> > *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> > *** lmn01 Job: 4820277.sdb ended: 29/09/17 11:58:22 queue: S4808886 ***
> > --------------------------------------------------------------------------------
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170929/106a5a2f/attachment.sig>
More information about the petsc-users
mailing list