[petsc-users] [petsc-maint #89695] Re: Memory problem

Rongliang Chen rongliang.chan at gmail.com
Fri Oct 7 11:58:08 CDT 2011


Hi Barry,

Thank you for your reply.
I don't think this problem comes from the matrix assemble. Because the
result I showed you in the last email is from a two-level Newton method
which means I first solve a coarse problem and use the coarse solution as
the fine level problem's initial guess. If I just use the one-level method,
there is no such problem. The memory usage in the -log_summary output is
correct and time spend on the SNESJacobianEval is also normal I think (see
attached) for the one-level method. The strange memory usage just appear in
the two-level method. The reason that I claim the two-level's computing time
is not correct is that I solve the same problem with the same number of
processors and the two-level's iteration number of SNES and GMRES is much
smaller than the one-level method, but the compute time is opposite (the
time spend on the coarse problem is just 25s). From the -log_summary outputs
of the two methods I found that the matrix's memory usage is total
different. So I think there must be some bugs in my two-level code. But I
have no idea how to debug this problem.

Best,
Rongliang

On Fri, Oct 7, 2011 at 10:24 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#efficient-assembly
>
>
> On Oct 7, 2011, at 11:22 AM, Rongliang Chen wrote:
>
> > -------------------------------------------------
> > Joab
> >
> > Shape Optimization solver
> >  by Rongliang Chen
> >  compiled on 15:54:32, Oct  3 2011
> >  Running on: Wed Oct  5 10:24:10 2011
> >
> >  revision $Rev: 157 $
> > -------------------------------------------------
> > Command-line options: -coarse_ksp_rtol 1.0e-1 -coarsegrid
> /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E2000_N8241_D70170.fsi
> -computeinitialguess -f
> /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi
> -geometric_asm -geometric_asm_overlap 8 -inletu 5.0 -ksp_atol 1e-8
> -ksp_gmres_restart 600 -ksp_max_it 3000 -ksp_pc_side right -ksp_rtol 1.e-3
> -ksp_type gmres -log_summary -mat_partitioning_type parmetis
> -nest_geometric_asm_overlap 4 -nest_ksp_atol 1e-8 -nest_ksp_gmres_restart
> 800 -nest_ksp_max_it 1000 -nest_ksp_pc_side right -nest_ksp_rtol 1.e-2
> -nest_ksp_type gmres -nest_pc_asm_type basic -nest_pc_type asm
> -nest_snes_atol 1.e-10 -nest_snes_max_it 20 -nest_snes_rtol 1.e-4
> -nest_sub_pc_factor_mat_ordering_type qmd -nest_sub_pc_factor_shift_amount
> 1e-8 -nest_sub_pc_factor_shift_type nonzero -nest_sub_pc_type lu -nested
> -noboundaryreduce -pc_asm_type basic -pc_type asm -shapebeta 10.0 -snes_atol
> 1.e-10 -snes_max_it 20 -snes_rtol 1.e-6 -sub_pc_f
> > actor_mat_ordering_type qmd -sub_pc_factor_shift_amount 1e-8
> -sub_pc_factor_shift_type nonzero -sub_pc_type lu -viscosity 0.01
> > -------------------------------------------------
> >
> > Starting to load grid...
> > Nodes on moving boundary: coarse 199, fine 799, Gridratio 0.250000.
> > Setupping Interpolation matrix......
> > Interpolation matrix done......Time spent: 0.405431
> > finished.
> > Grid has 32000 elements, 1096658 degrees of freedom.
> > Coarse grid has 2000 elements, 70170 degrees of freedom.
> >  [0] has 35380 degrees of freedom (matrix), 35380 degrees of freedom
> (including shared points).
> >  [0] coarse grid has 2194 degrees of freedom (matrix), 2194 degrees of
> freedom (including shared points).
> >  [31] has 32466 degrees of freedom (matrix), 34428 degrees of freedom
> (including shared points).
> >  [31] coarse grid has 2250 degrees of freedom (matrix), 2826 degrees of
> freedom (including shared points).
> > Time spend on the load grid and create matrix etc.: 3.577862.
> > Solving fixed mesh (steady-state problem)
> > Solving coarse problem......
> >  0 SNES norm 3.1224989992e+01, 0 KSP its last norm 0.0000000000e+00.
> >  1 SNES norm 1.3987219837e+00, 25 KSP its last norm 2.4915963656e-01.
> >  2 SNES norm 5.1898321541e-01, 59 KSP its last norm 1.3451744761e-02.
> >  3 SNES norm 4.0024228221e-02, 56 KSP its last norm 4.9036146089e-03.
> >  4 SNES norm 6.7641787439e-04, 59 KSP its last norm 3.6925683196e-04.
> > Coarse solver done......
> > Initial value of object function (Energy dissipation) (Coarse):
> 38.9341108701
> >  0 SNES norm 7.4575110699e+00, 0 KSP its last norm 0.0000000000e+00.
> >  1 SNES norm 6.4497565921e-02, 51 KSP its last norm 7.4277453141e-03.
> >  2 SNES norm 9.2093642958e-04, 90 KSP its last norm 5.4331380112e-05.
> >  3 SNES norm 8.1283574549e-07, 103 KSP its last norm 7.5974191049e-07.
> > Initial value of object function (Energy dissipation) (Fine):
> 42.5134271399
> > Solution time of 17.180358 sec.
> > Fixed mesh (Steady-state) solver done.
> > Total number of nonlinear iterations = 3
> > Total number of linear iterations = 244
> > Average number of linear iterations = 81.333336
> > Time computing: 17.180358 sec, Time outputting: 0.000000 sec.
> > Time spent in coarse nonlinear solve: 0.793436 sec, 0.046183 fraction of
> total compute time.
> > Solving Shape Optimization problem (steady-state problem)
> > Solving coarse problem......
> >  0 SNES norm 4.1963166116e+01, 0 KSP its last norm 0.0000000000e+00.
> >  1 SNES norm 3.2749386875e+01, 132 KSP its last norm 4.0966334477e-01.
> >  2 SNES norm 2.2874504408e+01, 130 KSP its last norm 3.2526355310e-01.
> >  3 SNES norm 1.4327187891e+01, 132 KSP its last norm 2.1213029400e-01.
> >  4 SNES norm 1.7283643754e+00, 81 KSP its last norm 1.4233338128e-01.
> >  5 SNES norm 3.6703566918e-01, 133 KSP its last norm 1.6069896349e-02.
> >  6 SNES norm 3.6554528686e-03, 77 KSP its last norm 3.5379167356e-03.
> > Coarse solver done......
> > Optimized value of object function (Energy dissipation) (Coarse):
> 29.9743062939
> > The reduction of the energy dissipation (Coarse): 23.012737%
> > The optimized curve (Coarse):
> > a = (4.500000, -0.042893, -0.002030, 0.043721, -0.018798, 0.001824)
> > Solving  moving mesh equation......
> > KSP norm 2.3040219081e-07, KSP its. 741. Time spent 8.481956
> > Moving mesh solver done.
> >  0 SNES norm 4.7843968670e+02, 0 KSP its last norm 0.0000000000e+00.
> >  1 SNES norm 1.0148854085e+02, 49 KSP its last norm 4.7373180511e-01.
> >  2 SNES norm 1.8312214030e+00, 46 KSP its last norm 1.0133332840e-01.
> >  3 SNES norm 3.3101970861e-03, 212 KSP its last norm 1.7753271069e-03.
> >  4 SNES norm 4.9552614008e-06, 249 KSP its last norm 3.2293284103e-06.
> > Optimized value of object function (Energy dissipation) (Fine):
> 33.2754372645
> > Solution time of 4053.227456 sec.
> > Number of unknowns = 1096658
> > Parameters: kinematic viscosity = 0.01
> >            inlet velocity: u = 5,  v = 0
> > Total number of nonlinear iterations = 4
> > Total number of linear iterations = 556
> > Average number of linear iterations = 139.000000
> > Time computing: 4053.227456 sec, Time outputting: 0.000001 sec.
> > Time spent in coarse nonlinear solve: 24.239526 sec, 0.005980 fraction of
> total compute time.
> > The optimized curve (fine):
> > a = (4.500000, -0.046468, -0.001963, 0.045736, -0.019141, 0.001789)
> > The reduction of the energy dissipation (Fine): 21.729582%
> > Time spend on fixed mesh solving: 17.296872
> > Time spend on shape opt. solving: 4053.250126
> > Latex command line:
> >  np    Newton   GMRES   Time(Total)    Time(Coarse)   Ratio
> > 32 &   4   &   139.00   &   4053.23  &    24.24   &  0.6\%
> >
> > Running finished on: Wed Oct  5 11:32:04 2011
> > Total running time: 4070.644329
> >
> ************************************************************************************************************************
> > ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
> >
> ************************************************************************************************************************
> >
> > ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
> >
> > ./joab on a Janus-nod named node1751 with 32 processors, by ronglian Wed
> Oct  5 11:32:04 2011
> > Using Petsc Release Version 3.2.0, Patch 1, Mon Sep 12 16:01:51 CDT 2011
> >
> >                         Max       Max/Min        Avg      Total
> > Time (sec):           4.074e+03      1.00000   4.074e+03
> > Objects:              1.011e+03      1.00000   1.011e+03
> > Flops:                2.255e+11      2.27275   1.471e+11  4.706e+12
> > Flops/sec:            5.535e+07      2.27275   3.609e+07  1.155e+09
> > MPI Messages:         1.103e+05      5.41392   3.665e+04  1.173e+06
> > MPI Message Lengths:  1.326e+09      2.60531   2.416e+04  2.833e+10
> > MPI Reductions:       5.969e+03      1.00000
> >
> > Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> >                            e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> >                            and VecAXPY() for complex vectors of length N
> --> 8N flops
> >
> > Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
> >                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
> > 0:      Main Stage: 4.0743e+03 100.0%  4.7058e+12 100.0%  1.173e+06
> 100.0%  2.416e+04      100.0%  5.968e+03 100.0%
> >
> >
> ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> > Phase summary info:
> >   Count: number of times phase was executed
> >   Time and Flops: Max - maximum over all processors
> >                   Ratio - ratio of maximum to minimum over all processors
> >   Mess: number of messages sent
> >   Avg. len: average message length
> >   Reduct: number of global reductions
> >   Global: entire computation
> >   Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> >      %T - percent time in this phase         %F - percent flops in this
> phase
> >      %M - percent messages in this phase     %L - percent message lengths
> in this phase
> >      %R - percent reductions in this phase
> >   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> >
> ------------------------------------------------------------------------------------------------------------------------
> > Event                Count      Time (sec)     Flops
>         --- Global ---  --- Stage ---   Total
> >                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> >
> ------------------------------------------------------------------------------------------------------------------------
> >
> > --- Event Stage 0: Main Stage
> >
> > MatMult             2493 1.0 1.2225e+0218.4 4.37e+09 1.1 3.9e+05 2.2e+03
> 0.0e+00  2  3 33  3  0   2  3 33  3  0  1084
> > MatMultTranspose       6 1.0 3.3590e-02 2.2 7.38e+06 1.1 8.0e+02 1.5e+03
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6727
> > MatSolve            2467 1.0 1.1270e+02 1.7 5.95e+10 1.7 0.0e+00 0.0e+00
> 0.0e+00  2 33  0  0  0   2 33  0  0  0 13775
> > MatLUFactorSym         4 1.0 3.4774e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> > MatLUFactorNum        18 1.0 2.0832e+02 3.7 1.55e+11 3.2 0.0e+00 0.0e+00
> 0.0e+00  2 56  0  0  0   2 56  0  0  0 12746
> > MatILUFactorSym        1 1.0 8.3280e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyBegin     103 1.0 7.6879e+0215.4 0.00e+00 0.0 1.6e+04 6.2e+04
> 1.7e+02  7  0  1  4  3   7  0  1  4  3     0
> > MatAssemblyEnd       103 1.0 3.7818e+01 1.0 0.00e+00 0.0 3.0e+03 5.3e+02
> 1.6e+02  1  0  0  0  3   1  0  0  0  3     0
> > MatGetRowIJ            5 1.0 4.8716e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatGetSubMatrice      18 1.0 4.3095e+00 2.5 0.00e+00 0.0 1.6e+04 3.5e+05
> 7.4e+01  0  0  1 20  1   0  0  1 20  1     0
> > MatGetOrdering         5 1.0 1.4656e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.4e+01  0  0  0  0  0   0  0  0  0  0     0
> > MatPartitioning        1 1.0 1.4356e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatZeroEntries        42 1.0 2.0939e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecDot                17 1.0 1.2719e-02 6.8 5.47e+05 1.1 0.0e+00 0.0e+00
> 1.7e+01  0  0  0  0  0   0  0  0  0  0  1317
> > VecMDot             2425 1.0 1.7196e+01 2.2 5.82e+09 1.1 0.0e+00 0.0e+00
> 2.4e+03  0  4  0  0 41   0  4  0  0 41 10353
> > VecNorm             2503 1.0 2.7923e+00 3.4 1.18e+08 1.1 0.0e+00 0.0e+00
> 2.5e+03  0  0  0  0 42   0  0  0  0 42  1293
> > VecScale            2467 1.0 7.3112e-02 1.7 5.84e+07 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 24453
> > VecCopy              153 1.0 1.1636e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSet              5031 1.0 6.0423e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecAXPY              137 1.0 1.1462e-02 1.5 6.33e+06 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 16902
> > VecWAXPY              19 1.0 1.7784e-03 1.4 2.83e+05 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  4869
> > VecMAXPY            2467 1.0 8.5820e+00 1.3 5.93e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  4  0  0  0   0  4  0  0  0 21153
> > VecAssemblyBegin      69 1.0 1.0341e+0018.2 0.00e+00 0.0 4.9e+03 5.4e+02
> 2.1e+02  0  0  0  0  3   0  0  0  0  3     0
> > VecAssemblyEnd        69 1.0 2.4939e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecScatterBegin     7491 1.0 1.3734e+00 1.7 0.00e+00 0.0 1.1e+06 1.9e+04
> 0.0e+00  0  0 96 76  0   0  0 96 76  0     0
> > VecScatterEnd       7491 1.0 2.0055e+02 8.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> > VecReduceArith         8 1.0 1.4977e-03 2.0 3.05e+05 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6232
> > VecReduceComm          4 1.0 8.9908e-0412.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecNormalize        2467 1.0 2.8067e+00 3.4 1.75e+08 1.1 0.0e+00 0.0e+00
> 2.4e+03  0  0  0  0 41   0  0  0  0 41  1905
> > SNESSolve              4 1.0 4.0619e+03 1.0 2.23e+11 2.3 9.4e+05 2.3e+04
> 4.1e+03100 98 80 77 68 100 98 80 77 68  1136
> > SNESLineSearch        17 1.0 1.1423e+01 1.0 5.23e+07 1.1 1.8e+04 1.7e+04
> 3.3e+02  0  0  2  1  6   0  0  2  1  6   140
> > SNESFunctionEval      23 1.0 2.9742e+01 1.0 2.60e+07 1.1 1.9e+04 1.9e+04
> 3.5e+02  1  0  2  1  6   1  0  2  1  6    27
> > SNESJacobianEval      17 1.0 3.6786e+03 1.0 0.00e+00 0.0 9.8e+03 6.4e+04
> 1.4e+02 90  0  1  2  2  90  0  1  2  2     0
> > KSPGMRESOrthog      2425 1.0 2.5150e+01 1.6 1.16e+10 1.1 0.0e+00 0.0e+00
> 2.4e+03  0  8  0  0 41   0  8  0  0 41 14157
> > KSPSetup              36 1.0 2.5388e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > KSPSolve              18 1.0 3.6141e+02 1.0 2.25e+11 2.3 1.1e+06 2.4e+04
> 5.0e+03  9100 97 96 84   9100 97 96 84 13015
> > PCSetUp               36 1.0 2.1635e+02 3.6 1.55e+11 3.2 1.8e+04 3.2e+05
> 1.5e+02  3 56  2 20  3   3 56  2 20  3 12274
> > PCSetUpOnBlocks       18 1.0 2.1293e+02 3.7 1.55e+11 3.2 0.0e+00 0.0e+00
> 2.7e+01  2 56  0  0  0   2 56  0  0  0 12471
> > PCApply             2467 1.0 2.5616e+02 2.5 5.95e+10 1.7 7.3e+05 2.8e+04
> 0.0e+00  4 33 62 73  0   4 33 62 73  0  6060
> >
> ------------------------------------------------------------------------------------------------------------------------
> >
> > Memory usage is given in bytes:
> >
> > Object Type          Creations   Destructions     Memory  Descendants'
> Mem.
> > Reports information only for process 0.
> >
> > --- Event Stage 0: Main Stage
> >
> >              Matrix    39             39  18446744074642894848     0
> > Matrix Partitioning     1              1          640     0
> >           Index Set   184            184      2589512     0
> >   IS L to G Mapping     2              2       301720     0
> >              Vector   729            729    133662888     0
> >      Vector Scatter    29             29        30508     0
> >   Application Order     2              2      9335968     0
> >                SNES     4              4         5088     0
> >       Krylov Solver    10             10     32264320     0
> >      Preconditioner    10             10         9088     0
> >              Viewer     1              0            0     0
> >
> ========================================================================================================================
> > Average time to get PetscTime(): 1.19209e-07
> > Average time for MPI_Barrier(): 1.20163e-05
> > Average time for zero size MPI_Send(): 2.49594e-06
> > #PETSc Option Table entries:
> > -coarse_ksp_rtol 1.0e-1
> > -coarsegrid
> /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E2000_N8241_D70170.fsi
> > -computeinitialguess
> > -f
> /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi
> > -geometric_asm
> > -geometric_asm_overlap 8
> > -inletu 5.0
> > -ksp_atol 1e-8
> > -ksp_gmres_restart 600
> > -ksp_max_it 3000
> > -ksp_pc_side right
> > -ksp_rtol 1.e-3
> > -ksp_type gmres
> > -log_summary
> > -mat_partitioning_type parmetis
> > -nest_geometric_asm_overlap 4
> > -nest_ksp_atol 1e-8
> > -nest_ksp_gmres_restart 800
> > -nest_ksp_max_it 1000
> > -nest_ksp_pc_side right
> > -nest_ksp_rtol 1.e-2
> > -nest_ksp_type gmres
> > -nest_pc_asm_type basic
> > -nest_pc_type asm
> > -nest_snes_atol 1.e-10
> > -nest_snes_max_it 20
> > -nest_snes_rtol 1.e-4
> > -nest_sub_pc_factor_mat_ordering_type qmd
> > -nest_sub_pc_factor_shift_amount 1e-8
> > -nest_sub_pc_factor_shift_type nonzero
> > -nest_sub_pc_type lu
> > -nested
> > -noboundaryreduce
> > -pc_asm_type basic
> > -pc_type asm
> > -shapebeta 10.0
> > -snes_atol 1.e-10
> > -snes_max_it 20
> > -snes_rtol 1.e-6
> > -sub_pc_factor_mat_ordering_type qmd
> > -sub_pc_factor_shift_amount 1e-8
> > -sub_pc_factor_shift_type nonzero
> > -sub_pc_type lu
> > -viscosity 0.01
> > #End of PETSc Option Table entries
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> > Configure run at: Tue Sep 13 13:28:48 2011
> > Configure options: --known-level1-dcache-size=32768
> --known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=8
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-batch=1
> --with-mpi-shared-libraries=1 --known-mpi-shared-libraries=0
> --download-f-blas-lapack=1 --download-hypre=1 --download-superlu=1
> --download-parmetis=1 --download-superlu_dist=1 --download-blacs=1
> --download-scalapack=1 --download-mumps=1 --with-debugging=0
> > -----------------------------------------
> > Libraries compiled on Tue Sep 13 13:28:48 2011 on node1367
> > Machine characteristics:
> Linux-2.6.18-238.12.1.el5-x86_64-with-redhat-5.6-Tikanga
> > Using PETSc directory: /home/ronglian/soft/petsc-3.2-p1
> > Using PETSc arch: Janus-nodebug
> > -----------------------------------------
> >
> > Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
> > Using Fortran compiler: mpif90  -Wall -Wno-unused-variable -O
> ${FOPTFLAGS} ${FFLAGS}
> > -----------------------------------------
> >
> > Using include paths:
> -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include
> -I/home/ronglian/soft/petsc-3.2-p1/include
> -I/home/ronglian/soft/petsc-3.2-p1/include
> -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include
> -I/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/include
> > -----------------------------------------
> >
> > Using C linker: mpicc
> > Using Fortran linker: mpif90
> > Using libraries:
> -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib
> -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lpetsc -lX11
> -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib
> -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lsuperlu_dist_2.5
> -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis
> -lHYPRE -lmpi_cxx -lstdc++ -lscalapack -lblacs -lsuperlu_4.2 -lflapack
> -lfblas -L/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/lib
> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal
> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm
> -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal
> -lnsl -lutil -lgcc_s -lpthread -ldl
> > -----------------------------------------
> >
> >>
> >> Yes, it has no influence on performance. If you think it does, send
> >> -log_summary output to petsc-maint at mcs.anl.gov
> >>
> >>  Matt
> >>
> >>
> > Hi Matt,
> >
> > The -log_summary output is attached. I found that the SNESJacobianEval()
> > takes 90% of the total time. I think this is abnormal because I use a
> hand
> > coded Jacobian matrix. The reason, I think, for the 90% of the total time
> is
> > that the matrix takes too much memory (over 1.8x10^19 bytes) which maybe
> > have used the swap. But I do not know why 23 one million by one million
> > matrices will use so much memory. Can you tell me how to debug this
> problem?
> > Thank you.
> >
> > Best,
> > Rongliang
> >
> >
> > Yes, it has no influence on performance. If you think it does, send
> > -log_summary output to petsc-maint at mcs.anl.gov
> >
> >   Matt
> >
> >
> > Hi Matt,
> >
> > The -log_summary output is attached. I found that the SNESJacobianEval()
> takes 90% of the total time. I think this is abnormal because I use a hand
> coded Jacobian matrix. The reason, I think, for the 90% of the total time is
> that the matrix takes too much memory (over 1.8x10^19 bytes) which maybe
> have used the swap. But I do not know why 23 one million by one million
> matrices will use so much memory. Can you tell me how to debug this problem?
> Thank you.
> >
> > Best,
> > Rongliang
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111007/ce463179/attachment-0001.htm>
-------------- next part --------------
-------------------------------------------------
Joab

Shape Optimization solver
  by Rongliang Chen
  compiled on 15:54:32, Oct  3 2011
  Running on: Wed Oct  5 11:23:25 2011

  revision $Rev: 157 $ 
-------------------------------------------------
Command-line options: -computeinitialguess -f /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi -geometric_asm -geometric_asm_overlap 8 -inletu 5.0 -ksp_atol 1e-8 -ksp_gmres_restart 400 -ksp_max_it 3000 -ksp_pc_side right -ksp_rtol 1.e-2 -ksp_type gmres -log_summary -mat_partitioning_type parmetis -pc_asm_type basic -pc_type asm -shapebeta 10.0 -snes_atol 1.e-10 -snes_max_it 20 -snes_rtol 1.e-6 -sub_pc_factor_mat_ordering_type qmd -sub_pc_factor_shift_amount 1e-8 -sub_pc_factor_shift_type nonzero -sub_pc_type lu -viscosity 0.01  
-------------------------------------------------

Starting to load grid...
finished.
Grid has 32000 elements, 1096658 degrees of freedom.
  [0] has 35380 degrees of freedom (matrix), 35380 degrees of freedom (including shared points).
  [31] has 32466 degrees of freedom (matrix), 34428 degrees of freedom (including shared points).
Time spend on the load grid and create matrix etc.: 2.524287.
Solving fixed mesh (steady-state problem)
  0 SNES norm 6.3047601065e+01, 0 KSP its last norm 0.0000000000e+00.
  1 SNES norm 6.8497829311e-01, 34 KSP its last norm 5.0668501287e-01.
  2 SNES norm 1.8277997453e-01, 104 KSP its last norm 5.3649449690e-03.
  3 SNES norm 1.2494936037e-02, 93 KSP its last norm 1.4890024153e-03.
  4 SNES norm 2.4161921075e-04, 98 KSP its last norm 1.0705394154e-04.
  5 SNES norm 2.3507660310e-06, 85 KSP its last norm 2.3033769563e-06.
Initial value of object function (Energy dissipation) (Fine): 42.5134315176
Solution time of 22.377218 sec.
Fixed mesh (Steady-state) solver done.
Total number of nonlinear iterations = 5
Total number of linear iterations = 414
Average number of linear iterations = 82.800003
Time computing: 22.377218 sec, Time outputting: 0.000000 sec.
Solving Shape Optimization problem (steady-state problem) 
  0 SNES norm 1.7510864453e+02, 0 KSP its last norm 0.0000000000e+00.
  1 SNES norm 2.3546363669e+01, 188 KSP its last norm 1.6824374180e+00.
  2 SNES norm 1.4710489481e+01, 227 KSP its last norm 2.1654000566e-01.
  3 SNES norm 5.1619747492e+00, 216 KSP its last norm 1.4024550925e-01.
  4 SNES norm 1.6085957859e+00, 228 KSP its last norm 5.0298508735e-02.
  5 SNES norm 3.2115315727e-02, 200 KSP its last norm 1.5654715717e-02.
  6 SNES norm 2.3318868200e-03, 224 KSP its last norm 3.0282269758e-04.
  7 SNES norm 2.2884883607e-05, 139 KSP its last norm 2.2886088166e-05.
Optimized value of object function (Energy dissipation) (Fine): 33.2754517498
Solution time of 1769.098900 sec.
Number of unknowns = 1096658
Parameters: kinematic viscosity = 0.01
            inlet velocity: u = 5,  v = 0 
Total number of nonlinear iterations = 7
Total number of linear iterations = 1422
Average number of linear iterations = 203.142853
Time computing: 1769.098900 sec, Time outputting: 0.000000 sec.
The optimized curve (fine):
 a = (4.500000, -0.046461, -0.001963, 0.045736, -0.019145, 0.001790) 
The reduction of the energy dissipation (Fine): 21.729556% 
Time spend on fixed mesh solving: 22.471772 
Time spend on shape opt. solving: 1769.109364 
Latex command line: 
  np    Newton   GMRES   Time   Reduction 
 32 &   7   &   203.14   &   1769.10   &   21.7\%   

Running finished on: Wed Oct  5 11:53:19 2011
Total running time: 1791.677926 
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./joab on a Janus-nod named node1777 with 32 processors, by ronglian Wed Oct  5 11:53:19 2011
Using Petsc Release Version 3.2.0, Patch 1, Mon Sep 12 16:01:51 CDT 2011 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.794e+03      1.00000   1.794e+03
Objects:              5.370e+02      1.00000   5.370e+02
Flops:                4.316e+11      2.18377   2.876e+11  9.203e+12
Flops/sec:            2.405e+08      2.18376   1.603e+08  5.129e+09
MPI Messages:         1.024e+05      6.85190   2.949e+04  9.436e+05
MPI Message Lengths:  1.867e+09      2.60703   4.210e+04  3.973e+10
MPI Reductions:       4.270e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.7943e+03 100.0%  9.2028e+12 100.0%  9.436e+05 100.0%  4.210e+04      100.0%  4.269e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1862 1.0 1.0334e+02 8.2 9.20e+09 1.1 3.2e+05 4.3e+03 0.0e+00  3  3 33  3  0   3  3 33  3  0  2693
MatSolve            1848 1.0 2.6237e+02 1.7 1.41e+11 1.7 0.0e+00 0.0e+00 0.0e+00 12 40  0  0  0  12 40  0  0  0 13897
MatLUFactorSym         2 1.0 1.9804e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum        12 1.0 2.4884e+02 2.6 2.69e+11 3.2 0.0e+00 0.0e+00 0.0e+00  9 49  0  0  0   9 49  0  0  0 18224
MatAssemblyBegin      65 1.0 4.1394e+0248.0 0.00e+00 0.0 1.0e+04 1.2e+05 1.0e+02  7  0  1  3  2   7  0  1  3  2     0
MatAssemblyEnd        65 1.0 5.3107e+01 1.0 0.00e+00 0.0 1.2e+03 9.1e+02 8.4e+01  3  0  0  0  2   3  0  0  0  2     0
MatGetRowIJ            2 1.0 2.9041e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice      12 1.0 3.6227e+00 1.4 0.00e+00 0.0 9.7e+03 8.2e+05 4.4e+01  0  0  1 20  1   0  0  1 20  1     0
MatGetOrdering         2 1.0 8.1808e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPartitioning        1 1.0 8.6379e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        26 1.0 2.6638e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot                12 1.0 2.6891e-0220.3 8.60e+05 1.1 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0   979
VecMDot             1836 1.0 3.4383e+01 2.3 1.20e+10 1.1 0.0e+00 0.0e+00 1.8e+03  1  4  0  0 43   1  4  0  0 43 10662
VecNorm             1872 1.0 5.5265e+00 8.7 1.34e+08 1.1 0.0e+00 0.0e+00 1.9e+03  0  0  0  0 44   0  0  0  0 44   743
VecScale            1848 1.0 6.1965e-02 1.5 6.62e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 32706
VecCopy               86 1.0 1.0849e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              3734 1.0 6.4959e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               54 1.0 5.5003e-03 1.6 3.87e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 21533
VecWAXPY              12 1.0 2.8160e-03 1.5 4.30e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4673
VecMAXPY            1848 1.0 1.8090e+01 1.4 1.21e+10 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0 20488
VecAssemblyBegin      38 1.0 1.4457e-01 3.2 0.00e+00 0.0 2.6e+03 8.4e+02 1.1e+02  0  0  0  0  3   0  0  0  0  3     0
VecAssemblyEnd        38 1.0 1.3518e-04 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     5592 1.0 1.5577e+00 1.8 0.00e+00 0.0 9.2e+05 3.3e+04 0.0e+00  0  0 97 77  0   0  0 97 77  0     0
VecScatterEnd       5592 1.0 2.6118e+0212.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
VecReduceArith         4 1.0 4.2009e-04 1.1 2.87e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 20884
VecReduceComm          2 1.0 3.0112e-04 9.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize        1848 1.0 5.2645e+00 9.2 1.98e+08 1.1 0.0e+00 0.0e+00 1.8e+03  0  0  0  0 43   0  0  0  0 43  1150
SNESSolve              2 1.0 1.7915e+03 1.0 4.32e+11 2.2 9.4e+05 4.2e+04 4.1e+03100100100100 96 100100100100 96  5137
SNESLineSearch        12 1.0 1.1234e+01 1.0 8.30e+07 1.1 1.1e+04 3.2e+04 2.1e+02  1  0  1  1  5   1  0  1  1  5   224
SNESFunctionEval      14 1.0 2.7475e+01 1.0 3.73e+07 1.1 1.2e+04 3.6e+04 2.1e+02  2  0  1  1  5   2  0  1  1  5    41
SNESJacobianEval      12 1.0 1.1870e+03 1.0 0.00e+00 0.0 6.6e+03 1.2e+05 8.8e+01 66  0  1  2  2  66  0  1  2  2     0
KSPGMRESOrthog      1836 1.0 5.0183e+01 1.7 2.40e+10 1.1 0.0e+00 0.0e+00 1.8e+03  2  8  0  0 43   2  8  0  0 43 14611
KSPSetup              24 1.0 4.9849e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              12 1.0 5.7689e+02 1.0 4.31e+11 2.2 9.2e+05 4.2e+04 3.7e+03 32100 98 97 88  32100 98 97 88 15948
PCSetUp               24 1.0 2.5429e+02 2.5 2.69e+11 3.2 1.0e+04 7.7e+05 7.6e+01  9 49  1 20  2   9 49  1 20  2 17833
PCSetUpOnBlocks       12 1.0 2.5128e+02 2.6 2.69e+11 3.2 0.0e+00 0.0e+00 1.2e+01  9 49  0  0  0   9 49  0  0  0 18047
PCApply             1848 1.0 4.1893e+02 1.9 1.41e+11 1.7 6.0e+05 4.9e+04 0.0e+00 18 40 63 73  0  18 40 63 73  0  8703
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix    17             17    857526496     0
 Matrix Partitioning     1              1          640     0
           Index Set    91             91      2079184     0
   IS L to G Mapping     1              1       283604     0
              Vector   402            402    111977984     0
      Vector Scatter    13             13        13676     0
   Application Order     1              1      8773904     0
                SNES     2              2         2544     0
       Krylov Solver     4              4      5192080     0
      Preconditioner     4              4         3632     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 9.58443e-06
Average time for zero size MPI_Send(): 2.94298e-06
#PETSc Option Table entries:
-computeinitialguess
-f /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi
-geometric_asm
-geometric_asm_overlap 8
-inletu 5.0
-ksp_atol 1e-8
-ksp_gmres_restart 400
-ksp_max_it 3000
-ksp_pc_side right
-ksp_rtol 1.e-2
-ksp_type gmres
-log_summary
-mat_partitioning_type parmetis
-pc_asm_type basic
-pc_type asm
-shapebeta 10.0
-snes_atol 1.e-10
-snes_max_it 20
-snes_rtol 1.e-6
-sub_pc_factor_mat_ordering_type qmd
-sub_pc_factor_shift_amount 1e-8
-sub_pc_factor_shift_type nonzero
-sub_pc_type lu
-viscosity 0.01
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Sep 13 13:28:48 2011
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=8 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-batch=1 --with-mpi-shared-libraries=1 --known-mpi-shared-libraries=0 --download-f-blas-lapack=1 --download-hypre=1 --download-superlu=1 --download-parmetis=1 --download-superlu_dist=1 --download-blacs=1 --download-scalapack=1 --download-mumps=1 --with-debugging=0
-----------------------------------------
Libraries compiled on Tue Sep 13 13:28:48 2011 on node1367 
Machine characteristics: Linux-2.6.18-238.12.1.el5-x86_64-with-redhat-5.6-Tikanga
Using PETSc directory: /home/ronglian/soft/petsc-3.2-p1
Using PETSc arch: Janus-nodebug
-----------------------------------------

Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -Wall -Wno-unused-variable -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include -I/home/ronglian/soft/petsc-3.2-p1/include -I/home/ronglian/soft/petsc-3.2-p1/include -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include -I/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lpetsc -lX11 -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -lmpi_cxx -lstdc++ -lscalapack -lblacs -lsuperlu_4.2 -lflapack -lfblas -L/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
-----------------------------------------


More information about the petsc-users mailing list