[petsc-users] Understanding matmult memory performance

Fri Sep 29 09:05:01 CDT 2017

On Fri, Sep 29, 2017 at 09:04:47AM -0400, Tobin Isaac wrote:
> On Fri, Sep 29, 2017 at 12:19:54PM +0100, Lawrence Mitchell wrote:
> > Dear all,
> > 
> > I'm attempting to understand some results I'm getting for matmult performance.  In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible.
> > 
> > The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon E5-2697 v2).  The specced main memory bandwidth is 85.3 GB/s per node, and I measure a STREAM triad bandwidth using 48 MPI processes (two nodes) of 148.2 GB/s.  The last level cache is 30MB (shared between 12 cores)
> 
> One thought: triad has a 1:2 write:read ratio, but with your MatMult()
> for P3 you would have about 1:50.  Unless triad used nontemporal
> stores, the reported bandwidth from triad will be about 3./4. of the
> bandwidth available to pure streaming reads, so maybe you actually
> have ~197 GB/s of read bandwidth available.  MatMult() would still be
> doing suspiciously well, but it would be within the measurements.  How
> confident are you in the specced bandwidth?

Are you running on archer?  I found one site [1] that lists the
bandwidth you gave, which corresponds to DDR3-1333, but other sites
[2] all say the nodes have DDR3-1833, in which case you would be
getting about 80% of spec bandwidth.

[1]: https://www.archer.ac.uk/documentation/best-practice-guide/arch.php
[2]: https://www.epcc.ed.ac.uk/blog/2013/11/20/archer-next-national-hpc-service-academic-research

> 
> Cheers,
>   Toby
> 
> > 
> > The matrix I'm using is respectively a P1, P2, and P3 discretisation of the Laplacian on a regular tetrahedral grid.
> > 
> > The matrix sizes are respectively:
> > 
> > P1:
> > Mat Object: 48 MPI processes
> >   type: mpiaij
> >   rows=8120601, cols=8120601
> >   total: nonzeros=120841801, allocated nonzeros=120841801
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node (on process 0) routines
> > 
> > 
> > P2:
> > Mat Object: 48 MPI processes
> >   type: mpiaij
> >   rows=8120601, cols=8120601
> >   total: nonzeros=231382401, allocated nonzeros=231382401
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node (on process 0) routines
> > 
> > 
> > P3:
> > Mat Object: 48 MPI processes
> >   type: mpiaij
> >   rows=13997521, cols=13997521
> >   total: nonzeros=674173201, allocated nonzeros=674173201
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node (on process 0) routines
> > 
> > 
> > Both sizeof(PetscScalar) and sizeof(PetscInt) are 8 bytes.
> > 
> > Ignoring data for vector and row indices, then, for a matmult I need to move 16*nonzeros bytes.
> > 
> > MatMults take, respectively:
> > 
> > P1: 0.0114362s
> > P2: 0.0196032s
> > P3: 0.0524525s
> > 
> > So the estimated achieved memory bandwidth is:
> > 
> > P1: 120841801 * 16 / 0.0114362 = 157.45GB/s
> > P2: 231382401 * 16 / 0.0196032 = 175.88GB/s
> > P3: 674173201 * 16 / 0.0524525 = 191.52GB/s
> > 
> > So all of those numbers are higher than the stream bandwidth, and the P2 and P3 numbers are higher than the spec sheet bandwidth.
> > 
> > I don't think PETSc is doing anything magic, but hints appreciated, it would be nice to explain this.
> > 
> > Cheers,
> > 
> > Lawrence
> > 
> > Full -log_view output:
> > 
> > --------------------------------------------------------------------------------
> > *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> > *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> > *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> > *** lmn01   Job: 4820277.sdb   started: 29/09/17 11:56:03   host: mom1 ***
> > 
> > --------------------------------------------------------------------------------
> > Int Type has 8 bytes, Scalar Type has 8 bytes
> > 
> > P1:
> > Mat Object: 48 MPI processes
> >   type: mpiaij
> >   rows=8120601, cols=8120601
> >   total: nonzeros=120841801, allocated nonzeros=120841801
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node (on process 0) routines
> > 
> > P2:
> > Mat Object: 48 MPI processes
> >   type: mpiaij
> >   rows=8120601, cols=8120601
> >   total: nonzeros=231382401, allocated nonzeros=231382401
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node (on process 0) routines
> > 
> > P3:
> > Mat Object: 48 MPI processes
> >   type: mpiaij
> >   rows=13997521, cols=13997521
> >   total: nonzeros=674173201, allocated nonzeros=674173201
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node (on process 0) routines
> > 
> > ************************************************************************************************************************
> > ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> > ************************************************************************************************************************
> > 
> > ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> > 
> > profile-matvec.py on a petsc-gnu51-ivybridge-int64 named nid00013 with 48 processors, by lmn01 Fri Sep 29 11:58:21 2017
> > Using Petsc Development GIT revision: v3.7.5-3014-g413f72f  GIT Date: 2017-02-05 17:50:57 -0600
> > 
> >                          Max       Max/Min        Avg      Total 
> > Time (sec):           1.150e+02      1.00000   1.150e+02
> > Objects:              1.832e+03      1.50534   1.269e+03
> > Flops:                2.652e+10      1.16244   2.486e+10  1.193e+12
> > Flops/sec:            2.306e+08      1.16244   2.162e+08  1.038e+10
> > MPI Messages:         1.021e+04      3.00279   5.091e+03  2.444e+05
> > MPI Message Lengths:  3.314e+09      1.97310   3.697e+05  9.035e+10
> > MPI Reductions:       2.630e+02      1.00000
> > 
> > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> >                             e.g., VecAXPY() for real vectors of length N --> 2N flops
> >                             and VecAXPY() for complex vectors of length N --> 8N flops
> > 
> > Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
> >                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
> >  0:      Main Stage: 1.0701e+02  93.1%  5.5715e+11  46.7%  1.942e+05  79.4%  3.644e+05       98.6%  2.560e+02  97.3% 
> >  1: P(1) aij matrix: 1.5561e+00   1.4%  5.5574e+10   4.7%  1.688e+04   6.9%  9.789e+02        0.3%  2.000e+00   0.8% 
> >  2: P(2) aij matrix: 1.9378e+00   1.7%  8.8214e+10   7.4%  1.688e+04   6.9%  1.483e+03        0.4%  2.000e+00   0.8% 
> >  3: P(3) aij matrix: 4.4890e+00   3.9%  4.9225e+11  41.3%  1.648e+04   6.7%  2.829e+03        0.8%  2.000e+00   0.8% 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on interpreting output.
> > Phase summary info:
> >    Count: number of times phase was executed
> >    Time and Flops: Max - maximum over all processors
> >                    Ratio - ratio of maximum to minimum over all processors
> >    Mess: number of messages sent
> >    Avg. len: average message length (bytes)
> >    Reduct: number of global reductions
> >    Global: entire computation
> >    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> >       %T - percent time in this phase         %F - percent flops in this phase
> >       %M - percent messages in this phase     %L - percent message lengths in this phase
> >       %R - percent reductions in this phase
> >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> > ------------------------------------------------------------------------------------------------------------------------
> > Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
> >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > --- Event Stage 0: Main Stage
> > 
> > PetscBarrier           4 1.0 2.7271e+00 1.0 0.00e+00 0.0 3.8e+03 2.4e+01 2.0e+01  2  0  2  0  8   3  0  2  0  8     0
> > BuildTwoSided        124 1.0 9.0858e+00 7.2 0.00e+00 0.0 2.7e+04 8.0e+00 0.0e+00  6  0 11  0  0   7  0 14  0  0     0
> > VecSet                16 1.0 5.8370e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecScatterBegin        3 1.0 2.1945e-0269.7 0.00e+00 0.0 1.3e+03 2.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
> > VecScatterEnd          3 1.0 2.2460e-0218.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSetRandom           3 1.0 4.0847e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatMult                3 1.0 9.4907e-02 1.2 4.50e+07 1.1 1.3e+03 2.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0 21311
> > MatAssemblyBegin      12 1.0 2.6438e-03235.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd        12 1.0 6.6632e-01 2.5 0.00e+00 0.0 2.5e+03 1.3e+04 2.4e+01  0  0  1  0  9   0  0  1  0  9     0
> > MatView                9 1.0 5.3831e-0112.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  3   0  0  0  0  4     0
> > Mesh Partition         6 1.0 1.3552e+01 1.0 0.00e+00 0.0 1.0e+05 5.9e+04 3.3e+01 12  0 41  7 13  13  0 52  7 13     0
> > Mesh Migration         6 1.0 1.8341e+01 1.0 0.00e+00 0.0 7.5e+04 1.0e+06 7.2e+01 16  0 31 85 27  17  0 39 86 28     0
> > DMPlexInterp           3 1.0 1.3771e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 12  0  0  0  2  13  0  0  0  2     0
> > DMPlexDistribute       3 1.0 1.0266e+01 1.0 0.00e+00 0.0 4.9e+04 5.9e+04 2.7e+01  9  0 20  3 10  10  0 25  3 11     0
> > DMPlexDistCones        6 1.0 6.9775e+00 1.5 0.00e+00 0.0 1.2e+04 2.3e+06 0.0e+00  5  0  5 32  0   6  0  6 32  0     0
> > DMPlexDistLabels       6 1.0 7.9111e+00 1.0 0.00e+00 0.0 4.0e+04 9.8e+05 6.0e+00  7  0 16 43  2   7  0 21 44  2     0
> > DMPlexDistribOL        3 1.0 2.2335e+01 1.0 0.00e+00 0.0 1.3e+05 6.6e+05 7.8e+01 19  0 53 94 30  21  0 66 95 30     0
> > DMPlexDistField        9 1.0 7.2773e-01 1.0 0.00e+00 0.0 1.7e+04 2.0e+05 6.0e+00  1  0  7  4  2   1  0  9  4  2     0
> > DMPlexDistData         6 1.0 8.0047e+00 9.4 0.00e+00 0.0 8.6e+04 1.2e+04 0.0e+00  6  0 35  1  0   6  0 45  1  0     0
> > DMPlexStratify        19 1.0 1.8531e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 15  0  0  0  7  16  0  0  0  7     0
> > SFSetGraph           141 1.0 2.2412e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> > SFBcastBegin         271 1.0 1.1975e+01 2.0 0.00e+00 0.0 1.8e+05 4.8e+05 0.0e+00  9  0 75 98  0  10  0 95100  0     0
> > SFBcastEnd           271 1.0 6.4306e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> > SFReduceBegin         12 1.0 1.7538e-0112.8 0.00e+00 0.0 4.8e+03 5.9e+04 0.0e+00  0  0  2  0  0   0  0  2  0  0     0
> > SFReduceEnd           12 1.0 2.2638e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > SFFetchOpBegin         3 1.0 9.9087e-0415.6 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > SFFetchOpEnd           3 1.0 3.6049e-02 6.4 0.00e+00 0.0 6.3e+02 3.9e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > CreateMesh            15 1.0 4.9047e+01 1.0 0.00e+00 0.0 1.8e+05 4.9e+05 1.2e+02 42  0 73 97 44  45  0 92 98 46     0
> > CreateFunctionSpace       3 1.0 4.2819e+01 1.0 0.00e+00 0.0 1.4e+05 6.3e+05 1.2e+02 37  0 56 95 44  40  0 71 97 45     0
> > Mesh: reorder          3 1.0 1.5455e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  1  0  0  0  2   1  0  0  0  2     0
> > Mesh: numbering        3 1.0 1.0627e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  9  0  0  0  2  10  0  0  0  2     0
> > CreateSparsity         3 1.0 2.0243e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> > MatZeroInitial         3 1.0 2.7938e+00 1.0 0.00e+00 0.0 2.5e+03 1.3e+04 2.7e+01  2  0  1  0 10   3  0  1  0 11     0
> > ParLoopExecute         6 1.0 3.1709e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00  3 47  0  0  0   3100  0  0  0 175069
> > ParLoopset_4           2 1.0 1.1100e-0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopHaloEnd         6 1.0 2.9564e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednBegin       6 1.0 7.0810e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednEnd         6 1.0 6.5088e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopCells           9 1.0 2.9736e+00 1.2 1.24e+10 1.2 0.0e+00 0.0e+00 0.0e+00  2 47  0  0  0   3100  0  0  0 186686
> > ParLoopset_10          2 1.0 1.1411e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopset_16          2 1.0 1.1880e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > 
> > --- Event Stage 1: P(1) aij matrix
> > 
> > VecScatterBegin       40 1.0 1.1312e-02 8.5 0.00e+00 0.0 1.7e+04 1.4e+04 0.0e+00  0  0  7  0  0   0  0100100  0     0
> > VecScatterEnd         40 1.0 2.6442e-0161.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  0  0  0  0     0
> > VecSetRandom          40 1.0 4.4251e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  27  0  0  0  0     0
> > MatMult               40 1.0 4.5745e-01 1.1 2.06e+08 1.1 1.7e+04 1.4e+04 0.0e+00  0  1  7  0  0  28 17100100  0 20423
> > MatAssemblyBegin       3 1.0 2.3842e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd         3 1.0 1.8371e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> > MatZeroEntries         1 1.0 5.2531e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatView                2 1.0 1.8248e-012468.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   5  0  0  0100     0
> > AssembleMat            1 1.0 7.0037e-01 1.0 1.01e+09 1.1 0.0e+00 0.0e+00 2.0e+00  1  4  0  0  1  45 83  0  0100 66009
> > ParLoopExecute         1 1.0 6.7369e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0  38 83  0  0  0 68623
> > ParLoopHaloEnd         1 1.0 1.3113e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednBegin       1 1.0 1.3113e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednEnd         1 1.0 1.0967e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopCells           3 1.0 6.7352e-01 1.4 1.01e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0  38 83  0  0  0 68641
> > 
> > --- Event Stage 2: P(2) aij matrix
> > 
> > VecScatterBegin       40 1.0 1.2448e-02 6.3 0.00e+00 0.0 1.7e+04 2.1e+04 0.0e+00  0  0  7  0  0   0  0100100  0     0
> > VecScatterEnd         40 1.0 4.3488e-0156.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  14  0  0  0  0     0
> > VecSetRandom          40 1.0 4.4287e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  22  0  0  0  0     0
> > MatMult               40 1.0 7.8413e-01 1.1 4.04e+08 1.1 1.7e+04 2.1e+04 0.0e+00  1  2  7  0  0  39 21100100  0 23192
> > MatAssemblyBegin       3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd         3 1.0 2.4675e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> > MatZeroEntries         1 1.0 9.4781e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatView                2 1.0 1.4482e-01344.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   3  0  0  0100     0
> > AssembleMat            1 1.0 7.5959e-01 1.0 1.57e+09 1.2 0.0e+00 0.0e+00 2.0e+00  1  6  0  0  1  39 79  0  0100 92192
> > ParLoopExecute         1 1.0 7.1835e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0  34 79  0  0  0 97484
> > ParLoopHaloEnd         1 1.0 1.1921e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednBegin       1 1.0 1.7881e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopCells           3 1.0 7.1820e-01 1.2 1.57e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0  34 79  0  0  0 97505
> > 
> > --- Event Stage 3: P(3) aij matrix
> > 
> > VecScatterBegin       40 1.0 2.3520e-0210.9 0.00e+00 0.0 1.6e+04 4.2e+04 0.0e+00  0  0  7  1  0   0  0100100  0     0
> > VecScatterEnd         40 1.0 6.6521e-0138.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  0  0  0  0     0
> > VecSetRandom          40 1.0 7.5565e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  16  0  0  0  0     0
> > MatMult               40 1.0 2.0981e+00 1.0 1.19e+09 1.1 1.6e+04 4.2e+04 0.0e+00  2  4  7  1  0  46 11100100  0 25439
> > MatAssemblyBegin       3 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd         3 1.0 5.6094e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> > MatZeroEntries         1 1.0 2.9610e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> > MatView                2 1.0 2.8071e-01958.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  1   3  0  0  0100     0
> > AssembleMat            1 1.0 1.7038e+00 1.0 9.94e+09 1.2 0.0e+00 0.0e+00 2.0e+00  1 37  0  0  1  38 89  0  0100 257591
> > ParLoopExecute         1 1.0 1.6101e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 37  0  0  0  32 89  0  0  0 272582
> > ParLoopHaloEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednBegin       1 1.0 1.7166e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopRednEnd         1 1.0 1.4067e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ParLoopCells           3 1.0 1.6099e+00 1.2 9.94e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 37  0  0  0  32 89  0  0  0 272617
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > Memory usage is given in bytes:
> > 
> > Object Type          Creations   Destructions     Memory  Descendants' Mem.
> > Reports information only for process 0.
> > 
> > --- Event Stage 0: Main Stage
> > 
> >            Container    12             11         6776     0.
> >               Viewer     4              0            0     0.
> >          PetscRandom     3              3         2058     0.
> >            Index Set  1095           1085      1383616     0.
> >    IS L to G Mapping    15             14    204830392     0.
> >              Section   222            209       158840     0.
> >               Vector    31             28     41441632     0.
> >       Vector Scatter     3              2         2416     0.
> >               Matrix    22             18    131705576     0.
> >     Distributed Mesh    40             37       182200     0.
> >     GraphPartitioner    19             18        11808     0.
> > Star Forest Bipartite Graph   206            200       178256     0.
> >      Discrete System    40             37        34336     0.
> > 
> > --- Event Stage 1: P(1) aij matrix
> > 
> >          PetscRandom    40             40        27440     0.
> > 
> > --- Event Stage 2: P(2) aij matrix
> > 
> >          PetscRandom    40             40        27440     0.
> > 
> > --- Event Stage 3: P(3) aij matrix
> > 
> >          PetscRandom    40             40        27440     0.
> > ========================================================================================================================
> > Average time to get PetscTime(): 0.
> > Average time for MPI_Barrier(): 1.06335e-05
> > Average time for zero size MPI_Send(): 1.41561e-06
> > #PETSc Option Table entries:
> > --dimension 3
> > --output-file poisson-matvecs.csv
> > --problem poisson
> > -log_view
> > -mat_view ::ascii_info
> > #End of PETSc Option Table entries
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> > Configure options: --COPTFLAGS="-march=ivybridge -O3" --CXXOPTFLAGS="-march=ivybridge -O3" --FOPTFLAGS="-march=ivybridge -O3" --PETSC_ARCH=petsc-gnu51-ivybridge-int64 --download-exodusii --download-hypre --download-metis --download-netcdf --download-parmetis --download-sowing=1 --known-bits-per-byte=8 --known-has-attribute-aligned=1 --known-level1-dcache-assoc=8 --known-level1-dcache-linesize=64 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=1 --known-sdot-returns-double=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --known-snrm2-returns-double=0 --prefix=/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64 --with-64-bit-indices=1 --with-batch=1 --with-blas-lapack-lib="-L/opt/cray/libsci/16.03.1/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.14/GNU/5.1 --with-hdf5=1 --with-make-np=4 --with-pic=1 --with-shared-libraries=1 --with-x=0 --download-eigen
> > -----------------------------------------
> > Libraries compiled on Tue Feb 14 12:07:09 2017 on eslogin003 
> > Machine characteristics: Linux-3.0.101-0.47.86.1.11753.0.PTF-default-x86_64-with-SuSE-11-x86_64
> > Using PETSc directory: /home2/n01/n01/lmn01/src/petsc
> > Using PETSc arch: petsc-gnu51-ivybridge-int64
> > -----------------------------------------
> > 
> > Using C compiler: cc  -fPIC  -march=ivybridge -O3  ${COPTFLAGS} ${CFLAGS}
> > Using Fortran compiler: ftn  -fPIC -march=ivybridge -O3   ${FOPTFLAGS} ${FFLAGS} 
> > -----------------------------------------
> > 
> > Using include paths: -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/include -I/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include -I/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/include/eigen3 -I/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/include
> > -----------------------------------------
> > 
> > Using C linker: cc
> > Using Fortran linker: ftn
> > Using libraries: -Wl,-rpath,/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -L/home2/n01/n01/lmn01/src/petsc/petsc-gnu51-ivybridge-int64/lib -lpetsc -Wl,-rpath,/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -L/work/n01/n01/lmn01/petsc-gnu51-ivybridge-int64/lib -lHYPRE -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -L/opt/cray/hdf5-parallel/1.8.14/GNU/5.1/lib -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl -lcrypto -ldl 
> > -----------------------------------------
> > 
> > Application 28632506 resources: utime ~4100s, stime ~428s, Rss ~2685552, inblocks ~2935062, outblocks ~42464
> > --------------------------------------------------------------------------------
> > 
> > Resources requested: ncpus=48,place=free,walltime=00:20:00
> > Resources allocated: cpupercent=0,cput=00:00:02,mem=8980kb,ncpus=48,vmem=172968kb,walltime=00:02:20
> > 
> > *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> > *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> > *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> > *** lmn01   Job: 4820277.sdb   ended: 29/09/17 11:58:22   queue: S4808886 ***
> > --------------------------------------------------------------------------------
> > 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170929/106a5a2f/attachment.sig>