[petsc-users] Memory problem

Matthew Knepley knepley at gmail.com
Wed Oct 5 18:00:44 CDT 2011


On Wed, Oct 5, 2011 at 5:56 PM, Rongliang Chen <rongliang.chan at gmail.com>wrote:

> Hello Everyone,
>
> I am testing a non-linear problem using the  snessolve(). The degree of
> freedoms of my test case is about 1 Million, which means the Jacobian matrix
> in the snessolve() is an 1 million by 1 million matrix and it should be a
> sparse matrix.
> And my question is that in the "-log_summary" output file I find a strange
> massage: "Matrix    39             39  18446744074642894848     0". Does
> this message mean that the matrix's memory usage is 1.8x10^20? I have no
> idea why an  one million by one million matrix use so much memory. Is this
> possible? The output of the "-log_summary" followed. Thanks.
>

That is an overflow somewhere. You can probably get the right answer by
using -snes_view. I will try and track down this overflow.

   Matt


> Best,
> Rongliang
>
>
> -------------------------------------------------------------------------------------------------------------------------
> Starting to load grid...
>  Nodes on moving boundary: coarse 199, fine 799, Gridratio 0.250000.
> Setupping Interpolation matrix......
> Interpolation matrix done......Time spent: 0.405431
> finished.
> Grid has 32000 elements, 1096658 degrees of freedom.
> Coarse grid has 2000 elements, 70170 degrees of freedom.
>   [0] has 35380 degrees of freedom (matrix), 35380 degrees of freedom
> (including shared points).
>   [0] coarse grid has 2194 degrees of freedom (matrix), 2194 degrees of
> freedom (including shared points).
>   [31] has 32466 degrees of freedom (matrix), 34428 degrees of freedom
> (including shared points).
>   [31] coarse grid has 2250 degrees of freedom (matrix), 2826 degrees of
> freedom (including shared points).
> Time spend on the load grid and create matrix etc.: 3.577862.
> Solving fixed mesh (steady-state problem)
> Solving coarse problem......
>   0 SNES norm 3.1224989992e+01, 0 KSP its last norm 0.0000000000e+00.
>   1 SNES norm 1.3987219837e+00, 25 KSP its last norm 2.4915963656e-01.
>   2 SNES norm 5.1898321541e-01, 59 KSP its last norm 1.3451744761e-02.
>   3 SNES norm 4.0024228221e-02, 56 KSP its last norm 4.9036146089e-03.
>   4 SNES norm 6.7641787439e-04, 59 KSP its last norm 3.6925683196e-04.
> Coarse solver done......
> Initial value of object function (Energy dissipation) (Coarse):
> 38.9341108701
>   0 SNES norm 7.4575110699e+00, 0 KSP its last norm 0.0000000000e+00.
>   1 SNES norm 6.4497565921e-02, 51 KSP its last norm 7.4277453141e-03.
>   2 SNES norm 9.2093642958e-04, 90 KSP its last norm 5.4331380112e-05.
>   3 SNES norm 8.1283574549e-07, 103 KSP its last norm 7.5974191049e-07.
> Initial value of object function (Energy dissipation) (Fine): 42.5134271399
> Solution time of 17.180358 sec.
> Fixed mesh (Steady-state) solver done.
> Total number of nonlinear iterations = 3
> Total number of linear iterations = 244
> Average number of linear iterations = 81.333336
> Time computing: 17.180358 sec, Time outputting: 0.000000 sec.
> Time spent in coarse nonlinear solve: 0.793436 sec, 0.046183 fraction of
> total compute time.
> Solving Shape Optimization problem (steady-state problem)
> Solving coarse problem......
>   0 SNES norm 4.1963166116e+01, 0 KSP its last norm 0.0000000000e+00.
>   1 SNES norm 3.2749386875e+01, 132 KSP its last norm 4.0966334477e-01.
>   2 SNES norm 2.2874504408e+01, 130 KSP its last norm 3.2526355310e-01.
>   3 SNES norm 1.4327187891e+01, 132 KSP its last norm 2.1213029400e-01.
>   4 SNES norm 1.7283643754e+00, 81 KSP its last norm 1.4233338128e-01.
>   5 SNES norm 3.6703566918e-01, 133 KSP its last norm 1.6069896349e-02.
>   6 SNES norm 3.6554528686e-03, 77 KSP its last norm 3.5379167356e-03.
> Coarse solver done......
> Optimized value of object function (Energy dissipation) (Coarse):
> 29.9743062939
> The reduction of the energy dissipation (Coarse): 23.012737%
> The optimized curve (Coarse):
>  a = (4.500000, -0.042893, -0.002030, 0.043721, -0.018798, 0.001824)
> Solving  moving mesh equation......
>  KSP norm 2.3040219081e-07, KSP its. 741. Time spent 8.481956
> Moving mesh solver done.
>   0 SNES norm 4.7843968670e+02, 0 KSP its last norm 0.0000000000e+00.
>   1 SNES norm 1.0148854085e+02, 49 KSP its last norm 4.7373180511e-01.
>   2 SNES norm 1.8312214030e+00, 46 KSP its last norm 1.0133332840e-01.
>   3 SNES norm 3.3101970861e-03, 212 KSP its last norm 1.7753271069e-03.
>   4 SNES norm 4.9552614008e-06, 249 KSP its last norm 3.2293284103e-06.
> Optimized value of object function (Energy dissipation) (Fine):
> 33.2754372645
> Solution time of 4053.227456 sec.
> Number of unknowns = 1096658
> Parameters: kinematic viscosity = 0.01
>             inlet velocity: u = 5,  v = 0
> Total number of nonlinear iterations = 4
> Total number of linear iterations = 556
> Average number of linear iterations = 139.000000
> Time computing: 4053.227456 sec, Time outputting: 0.000001 sec.
> Time spent in coarse nonlinear solve: 24.239526 sec, 0.005980 fraction of
> total compute time.
> The optimized curve (fine):
>  a = (4.500000, -0.046468, -0.001963, 0.045736, -0.019141, 0.001789)
> The reduction of the energy dissipation (Fine): 21.729582%
> Time spend on fixed mesh solving: 17.296872
> Time spend on shape opt. solving: 4053.250126
> Latex command line:
>   np    Newton   GMRES   Time(Total)    Time(Coarse)   Ratio
>  32 &   4   &   139.00   &   4053.23  &    24.24   &  0.6\%
>
> Running finished on: Wed Oct  5 11:32:04 2011
> Total running time: 4070.644329
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./joab on a Janus-nod named node1751 with 32 processors, by ronglian Wed
> Oct  5 11:32:04 2011
> Using Petsc Release Version 3.2.0, Patch 1, Mon Sep 12 16:01:51 CDT 2011
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           4.074e+03      1.00000   4.074e+03
> Objects:              1.011e+03      1.00000   1.011e+03
> Flops:                2.255e+11      2.27275   1.471e+11  4.706e+12
> Flops/sec:            5.535e+07      2.27275   3.609e+07  1.155e+09
> MPI Messages:         1.103e+05      5.41392   3.665e+04  1.173e+06
> MPI Message Lengths:  1.326e+09      2.60531   2.416e+04  2.833e+10
> MPI Reductions:       5.969e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 4.0743e+03 100.0%  4.7058e+12 100.0%  1.173e+06
> 100.0%  2.416e+04      100.0%  5.968e+03 100.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             2493 1.0 1.2225e+0218.4 4.37e+09 1.1 3.9e+05 2.2e+03
> 0.0e+00  2  3 33  3  0   2  3 33  3  0  1084
> MatMultTranspose       6 1.0 3.3590e-02 2.2 7.38e+06 1.1 8.0e+02 1.5e+03
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6727
> MatSolve            2467 1.0 1.1270e+02 1.7 5.95e+10 1.7 0.0e+00 0.0e+00
> 0.0e+00  2 33  0  0  0   2 33  0  0  0 13775
> MatLUFactorSym         4 1.0 3.4774e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum        18 1.0 2.0832e+02 3.7 1.55e+11 3.2 0.0e+00 0.0e+00
> 0.0e+00  2 56  0  0  0   2 56  0  0  0 12746
> MatILUFactorSym        1 1.0 8.3280e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin     103 1.0 7.6879e+0215.4 0.00e+00 0.0 1.6e+04 6.2e+04
> 1.7e+02  7  0  1  4  3   7  0  1  4  3     0
> MatAssemblyEnd       103 1.0 3.7818e+01 1.0 0.00e+00 0.0 3.0e+03 5.3e+02
> 1.6e+02  1  0  0  0  3   1  0  0  0  3     0
> MatGetRowIJ            5 1.0 4.8716e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrice      18 1.0 4.3095e+00 2.5 0.00e+00 0.0 1.6e+04 3.5e+05
> 7.4e+01  0  0  1 20  1   0  0  1 20  1     0
> MatGetOrdering         5 1.0 1.4656e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.4e+01  0  0  0  0  0   0  0  0  0  0     0
> MatPartitioning        1 1.0 1.4356e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries        42 1.0 2.0939e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecDot                17 1.0 1.2719e-02 6.8 5.47e+05 1.1 0.0e+00 0.0e+00
> 1.7e+01  0  0  0  0  0   0  0  0  0  0  1317
> VecMDot             2425 1.0 1.7196e+01 2.2 5.82e+09 1.1 0.0e+00 0.0e+00
> 2.4e+03  0  4  0  0 41   0  4  0  0 41 10353
> VecNorm             2503 1.0 2.7923e+00 3.4 1.18e+08 1.1 0.0e+00 0.0e+00
> 2.5e+03  0  0  0  0 42   0  0  0  0 42  1293
> VecScale            2467 1.0 7.3112e-02 1.7 5.84e+07 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 24453
> VecCopy              153 1.0 1.1636e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              5031 1.0 6.0423e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              137 1.0 1.1462e-02 1.5 6.33e+06 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 16902
> VecWAXPY              19 1.0 1.7784e-03 1.4 2.83e+05 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  4869
> VecMAXPY            2467 1.0 8.5820e+00 1.3 5.93e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  4  0  0  0   0  4  0  0  0 21153
> VecAssemblyBegin      69 1.0 1.0341e+0018.2 0.00e+00 0.0 4.9e+03 5.4e+02
> 2.1e+02  0  0  0  0  3   0  0  0  0  3     0
> VecAssemblyEnd        69 1.0 2.4939e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     7491 1.0 1.3734e+00 1.7 0.00e+00 0.0 1.1e+06 1.9e+04
> 0.0e+00  0  0 96 76  0   0  0 96 76  0     0
> VecScatterEnd       7491 1.0 2.0055e+02 8.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> VecReduceArith         8 1.0 1.4977e-03 2.0 3.05e+05 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6232
> VecReduceComm          4 1.0 8.9908e-0412.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize        2467 1.0 2.8067e+00 3.4 1.75e+08 1.1 0.0e+00 0.0e+00
> 2.4e+03  0  0  0  0 41   0  0  0  0 41  1905
> SNESSolve              4 1.0 4.0619e+03 1.0 2.23e+11 2.3 9.4e+05 2.3e+04
> 4.1e+03100 98 80 77 68 100 98 80 77 68  1136
> SNESLineSearch        17 1.0 1.1423e+01 1.0 5.23e+07 1.1 1.8e+04 1.7e+04
> 3.3e+02  0  0  2  1  6   0  0  2  1  6   140
> SNESFunctionEval      23 1.0 2.9742e+01 1.0 2.60e+07 1.1 1.9e+04 1.9e+04
> 3.5e+02  1  0  2  1  6   1  0  2  1  6    27
> SNESJacobianEval      17 1.0 3.6786e+03 1.0 0.00e+00 0.0 9.8e+03 6.4e+04
> 1.4e+02 90  0  1  2  2  90  0  1  2  2     0
> KSPGMRESOrthog      2425 1.0 2.5150e+01 1.6 1.16e+10 1.1 0.0e+00 0.0e+00
> 2.4e+03  0  8  0  0 41   0  8  0  0 41 14157
> KSPSetup              36 1.0 2.5388e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              18 1.0 3.6141e+02 1.0 2.25e+11 2.3 1.1e+06 2.4e+04
> 5.0e+03  9100 97 96 84   9100 97 96 84 13015
> PCSetUp               36 1.0 2.1635e+02 3.6 1.55e+11 3.2 1.8e+04 3.2e+05
> 1.5e+02  3 56  2 20  3   3 56  2 20  3 12274
> PCSetUpOnBlocks       18 1.0 2.1293e+02 3.7 1.55e+11 3.2 0.0e+00 0.0e+00
> 2.7e+01  2 56  0  0  0   2 56  0  0  0 12471
> PCApply             2467 1.0 2.5616e+02 2.5 5.95e+10 1.7 7.3e+05 2.8e+04
> 0.0e+00  4 33 62 73  0   4 33 62 73  0  6060
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix    39             39  18446744074642894848     0
>  Matrix Partitioning     1              1          640     0
>            Index Set   184            184      2589512     0
>    IS L to G Mapping     2              2       301720     0
>               Vector   729            729    133662888     0
>       Vector Scatter    29             29        30508     0
>    Application Order     2              2      9335968     0
>                 SNES     4              4         5088     0
>        Krylov Solver    10             10     32264320     0
>       Preconditioner    10             10         9088     0
>               Viewer     1              0            0     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 1.19209e-07
> Average time for MPI_Barrier(): 1.20163e-05
> Average time for zero size MPI_Send(): 2.49594e-06
> ......................................
> -----------------------------------------
>



-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111005/71ad71f5/attachment.htm>


More information about the petsc-users mailing list