[petsc-users] Memory problem
Rongliang Chen
rongliang.chan at gmail.com
Thu Oct 6 10:27:59 CDT 2011
That is an overflow somewhere. You can probably get the right answer by
> using -snes_view. I will try and track down this overflow.
>
> Matt
>
>
>
Hi Matt,
Thank you for your reply.
The -snes_view and -log_summary output is followed. But I did not find any
unusual results in the -snes_view output.
I have another question. what the number 23 mean in the following
-log_summary output:
" Object Type Creations Destructions Memory Descendants'
Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 23 23 18446744073922240512 0 "
Does it mean that I created 23 matrices in my code? But I think I have not
created so many
matrices. Thanks.
Best,
Rongliang
------------------------------------------------------------------------------------------------------------
Starting to load grid...
Nodes on moving boundary: coarse 199, fine 799, Gridratio 0.250000.
Setupping Interpolation matrix......
Interpolation matrix done......Time spent: 0.207468
finished.
Grid has 32000 elements, 1096658 degrees of freedom.
Coarse grid has 2000 elements, 70170 degrees of freedom.
[0] has 9234 degrees of freedom (matrix), 9234 degrees of freedom
(including shared points).
[0] coarse grid has 484 degrees of freedom (matrix), 484 degrees of
freedom (including shared points).
[127] has 7876 degrees of freedom (matrix), 9100 degrees of freedom
(including shared points).
[127] coarse grid has 588 degrees of freedom (matrix), 912 degrees of
freedom (including shared points).
Time spend on the load grid and create matrix etc.: 3.719866.
Solving Shape Optimization problem (steady-state problem)
Solving coarse problem......
0 SNES norm 3.4998054301e+01, 0 KSP its last norm 0.0000000000e+00.
1 SNES norm 3.1501179205e+01, 34 KSP its last norm 3.3927102450e-01.
2 SNES norm 2.1246874435e+01, 57 KSP its last norm 3.1177722630e-01.
3 SNES norm 1.7390263296e+01, 141 KSP its last norm 1.9452289323e-01.
4 SNES norm 1.1644760718e+01, 160 KSP its last norm 1.6835316856e-01.
5 SNES norm 1.0601030093e+01, 181 KSP its last norm 1.1003156828e-01.
6 SNES norm 1.0145938759e+00, 126 KSP its last norm 1.0556059459e-01.
7 SNES norm 1.9267547420e-01, 203 KSP its last norm 9.9489004947e-03.
8 SNES norm 1.8901340973e-03, 195 KSP its last norm 1.8359299380e-03.
Coarse solver done......
Optimized value of object function (Energy dissipation) (Coarse):
29.9743671231
The reduction of the energy dissipation (Coarse): -inf%
The optimized curve (Coarse):
a = (4.500000, -0.042961, -0.002068, 0.043750, -0.018783, 0.001816)
Solving moving mesh equation......
KSP norm 2.2906632201e-07, KSP its. 741. Time spent 2.772948
Moving mesh solver done.
0 SNES norm 4.7914118974e+02, 0 KSP its last norm 0.0000000000e+00.
1 SNES norm 1.0150289152e+02, 63 KSP its last norm 4.6576374323e-01.
2 SNES norm 1.8326417396e+00, 90 KSP its last norm 9.9707541310e-02.
3 SNES norm 3.7711809663e-03, 348 KSP its last norm 1.8059473115e-03.
4 SNES norm 9.7342448527e-06, 484 KSP its last norm 3.6343704479e-06.
SNES Object: 128 MPI processes
type: ls
line search variant: SNESLineSearchCubic
alpha=1.000000000000e-04, maxstep=1.000000000000e+08,
minlambda=1.000000000000e-12
maximum iterations=20, maximum function evaluations=10000
tolerances: relative=1e-06, absolute=1e-10, solution=1e-08
total number of linear solver iterations=985
total number of function evaluations=5
KSP Object: 128 MPI processes
type: gmres
GMRES: restart=600, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=3000, initial guess is zero
tolerances: relative=0.001, absolute=1e-08, divergence=10000
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 128 MPI processes
type: asm
Additive Schwarz: total subdomain blocks = 128, user-defined overlap
Additive Schwarz: restriction/interpolation type - BASIC
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 1e-12
using diagonal shift to prevent zero pivot
matrix ordering: qmd
factor fill ratio given 5, needed 5.26731
Factored matrix follows:
Matrix Object: 1 MPI processes
type: seqaij
rows=25170, cols=25170
package used to perform factorization: petsc
total: nonzeros=11090981, allocated nonzeros=11090981
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 12872 nodes, limit used is 5
linear system matrix = precond matrix:
Matrix Object: 1 MPI processes
type: seqaij
rows=25170, cols=25170
total: nonzeros=2105626, allocated nonzeros=2105626
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 13453 nodes, limit used is 5
linear system matrix = precond matrix:
Matrix Object: 128 MPI processes
type: mpiaij
rows=1096658, cols=1096658
total: nonzeros=94170314, allocated nonzeros=223806957
total number of mallocs used during MatSetValues calls =6185057
not using I-node (on process 0) routines
Optimized value of object function (Energy dissipation) (Fine):
33.2754475059
Solution time of 395.289169 sec.
Number of unknowns = 1096658
Parameters: kinematic viscosity = 0.01
inlet velocity: u = 5, v = 0
Total number of nonlinear iterations = 4
Total number of linear iterations = 985
Average number of linear iterations = 246.250000
Time computing: 395.289169 sec, Time outputting: 0.000000 sec.
Time spent in coarse nonlinear solve: 13.134366 sec, 0.033227 fraction of
total compute time.
The optimized curve (fine):
a = (4.500000, -0.046466, -0.001962, 0.045734, -0.019141, 0.001789)
The reduction of the energy dissipation (Fine): -inf%
Time spend on fixed mesh solving: 0.013564
Time spend on shape opt. solving: 395.324390
Latex command line:
np Newton GMRES Time(Total) Time(Coarse) Ratio
128 & 4 & 246.25 & 395.29 & 13.13 & 3.3\%
Running finished on: Wed Oct 5 19:02:01 2011
Total running time: 395.376442
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./joab on a Janus-nod named node0844 with 128 processors, by ronglian Wed
Oct 5 19:02:01 2011
Using Petsc Release Version 3.2.0, Patch 1, Mon Sep 12 16:01:51 CDT 2011
Max Max/Min Avg Total
Time (sec): 3.991e+02 1.00013 3.991e+02
Objects: 1.066e+03 1.00000 1.066e+03
Flops: 7.938e+10 2.52133 5.615e+10 7.187e+12
Flops/sec: 1.989e+08 2.52100 1.407e+08 1.801e+10
MPI Messages: 2.724e+05 8.91400 6.158e+04 7.883e+06
MPI Message Lengths: 8.340e+08 2.63753 1.025e+04 8.083e+10
MPI Reductions: 6.537e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N -->
2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 3.9910e+02 100.0% 7.1875e+12 100.0% 7.883e+06 100.0%
1.025e+04 100.0% 6.536e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 2879 1.0 4.3698e+0116.7 1.86e+09 1.3 2.1e+06 1.4e+03
0.0e+00 4 3 27 4 0 4 3 27 4 0 4839
MatMultTranspose 3 1.0 3.0989e-0226.2 9.81e+05 1.2 2.0e+03 7.3e+02
0.0e+00 0 0 0 0 0 0 0 0 0 0 3646
MatSolve 2860 1.0 7.2956e+01 2.3 3.95e+10 2.5 0.0e+00 0.0e+00
0.0e+00 14 52 0 0 0 14 52 0 0 0 50895
MatLUFactorSym 2 1.0 1.3847e+00 4.7 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 13 1.0 4.9187e+01 5.8 3.56e+10 4.8 0.0e+00 0.0e+00
0.0e+00 5 33 0 0 0 5 33 0 0 0 48174
MatILUFactorSym 1 1.0 3.9380e-03 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 78 1.0 8.7529e+0129.0 0.00e+00 0.0 4.3e+04 5.2e+04
1.3e+02 11 0 1 3 2 11 0 1 3 2 0
MatAssemblyEnd 78 1.0 7.2215e+00 1.0 0.00e+00 0.0 8.5e+03 3.6e+02
1.1e+02 2 0 0 0 2 2 0 0 0 2 0
MatGetRowIJ 3 1.0 4.3902e-02 8.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 13 1.0 1.9239e+00 2.9 0.00e+00 0.0 7.9e+04 1.8e+05
5.1e+01 0 0 1 17 1 0 0 1 17 1 0
MatGetOrdering 3 1.0 4.1121e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatPartitioning 1 1.0 2.2540e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 32 1.0 3.8607e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 3.0 1.6980e-0323.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecDot 12 1.0 6.8195e-0317.5 8.36e+04 1.2 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 0 0 0 0 0 0 1451
VecMDot 2823 1.0 4.1334e+01 7.2 3.82e+09 1.2 0.0e+00 0.0e+00
2.8e+03 4 6 0 0 43 4 6 0 0 43 10682
VecNorm 2888 1.0 2.5551e+00 3.1 3.47e+07 1.2 0.0e+00 0.0e+00
2.9e+03 0 0 0 0 44 0 0 0 0 44 1575
VecScale 2860 1.0 2.0850e-02 1.9 1.73e+07 1.2 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 96028
VecCopy 117 1.0 2.1448e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 5795 1.0 2.2957e-01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 116 1.0 1.9181e-03 1.6 1.36e+06 1.2 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 82341
VecWAXPY 16 1.0 2.6107e-04 1.5 4.61e+04 1.2 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 21104
VecMAXPY 2860 1.0 5.2077e+00 1.5 3.85e+09 1.2 0.0e+00 0.0e+00
0.0e+00 1 6 0 0 0 1 6 0 0 0 85546
VecAssemblyBegin 60 1.0 3.4554e-0110.6 0.00e+00 0.0 1.8e+04 3.4e+02
1.8e+02 0 0 0 0 3 0 0 0 0 3 0
VecAssemblyEnd 60 1.0 1.9860e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 8648 1.0 1.1008e+00 3.0 0.00e+00 0.0 7.7e+06 8.4e+03
0.0e+00 0 0 98 80 0 0 0 98 80 0 0
VecScatterEnd 8648 1.0 8.4154e+0135.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 10 0 0 0 0 10 0 0 0 0 0
VecReduceArith 4 1.0 2.6989e-04 2.3 4.00e+04 1.2 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 17292
VecReduceComm 2 1.0 2.7108e-04 6.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 2860 1.0 2.5307e+00 3.1 5.17e+07 1.2 0.0e+00 0.0e+00
2.8e+03 0 0 0 0 44 0 0 0 0 44 2370
SNESSolve 2 1.0 3.9251e+02 1.0 7.85e+10 2.6 6.7e+06 9.7e+03
4.7e+03 98 98 84 80 72 98 98 84 80 72 18034
SNESLineSearch 12 1.0 3.0610e+00 1.0 1.26e+07 1.2 6.1e+04 1.1e+04
2.9e+02 1 0 1 1 4 1 0 1 1 4 473
SNESFunctionEval 18 1.0 6.5305e+00 1.0 6.01e+06 1.2 6.2e+04 1.3e+04
2.9e+02 2 0 1 1 4 2 0 1 1 4 106
SNESJacobianEval 12 1.0 2.4492e+02 1.0 0.00e+00 0.0 2.5e+04 5.9e+04
9.0e+01 61 0 0 2 1 61 0 0 2 1 0
KSPGMRESOrthog 2823 1.0 4.6476e+01 4.3 7.64e+09 1.2 0.0e+00 0.0e+00
2.8e+03 5 12 0 0 43 5 12 0 0 43 19001
KSPSetup 26 1.0 1.3622e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 13 1.0 1.4371e+02 1.0 7.94e+10 2.5 7.8e+06 1.0e+04
5.8e+03 36100 98 97 88 36100 98 97 88 50001
PCSetUp 26 1.0 5.2544e+01 5.3 3.56e+10 4.8 8.5e+04 1.7e+05
9.8e+01 6 33 1 17 1 6 33 1 17 1 45097
PCSetUpOnBlocks 13 1.0 5.0695e+01 5.7 3.56e+10 4.8 0.0e+00 0.0e+00
1.7e+01 6 33 0 0 0 6 33 0 0 0 46742
PCApply 2860 1.0 1.1268e+02 1.8 3.95e+10 2.5 5.6e+06 1.1e+04
0.0e+00 20 52 71 76 0 20 52 71 76 0 32953
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 23 23 18446744073922240512 0
Matrix Partitioning 1 1 640 0
Index Set 168 168 922496 0
IS L to G Mapping 2 2 78872 0
Vector 828 828 44121632 0
Vector Scatter 23 23 24196 0
Application Order 2 2 9335968 0
SNES 3 2 2544 0
Krylov Solver 7 6 16141840 0
Preconditioner 7 6 5456 0
Viewer 2 1 712 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.36034e-05
Average time for zero size MPI_Send(): 2.78279e-06
#PETSc Option Table entries:
-coarse_ksp_rtol 1.0e-1
-coarsegrid
/scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E2000_N8241_D70170.fsi
-f
/scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi
-geometric_asm
-geometric_asm_overlap 8
-inletu 5.0
-ksp_atol 1e-8
-ksp_gmres_restart 600
-ksp_max_it 3000
-ksp_pc_side right
-ksp_rtol 1.e-3
-ksp_type gmres
-log_summary
-mat_partitioning_type parmetis
-nest_geometric_asm_overlap 4
-nest_ksp_atol 1e-8
-nest_ksp_gmres_restart 800
-nest_ksp_max_it 1000
-nest_ksp_pc_side right
-nest_ksp_rtol 1.e-2
-nest_ksp_type gmres
-nest_pc_asm_type basic
-nest_pc_type asm
-nest_snes_atol 1.e-10
-nest_snes_max_it 20
-nest_snes_rtol 1.e-4
-nest_sub_pc_factor_mat_ordering_type qmd
-nest_sub_pc_factor_shift_amount 1e-8
-nest_sub_pc_factor_shift_type nonzero
-nest_sub_pc_type lu
-nested
-noboundaryreduce
-pc_asm_type basic
-pc_type asm
-shapebeta 10.0
-snes_atol 1.e-10
-snes_max_it 20
-snes_rtol 1.e-6
-snes_view
-sub_pc_factor_mat_ordering_type qmd
-sub_pc_factor_shift_amount 1e-8
-sub_pc_factor_shift_type nonzero
-sub_pc_type lu
-viscosity 0.01
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Sep 13 13:28:48 2011
Configure options: --known-level1-dcache-size=32768
--known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=8
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-batch=1
--with-mpi-shared-libraries=1 --known-mpi-shared-libraries=0
--download-f-blas-lapack=1 --download-hypre=1 --download-superlu=1
--download-parmetis=1 --download-superlu_dist=1 --download-blacs=1
--download-scalapack=1 --download-mumps=1 --with-debugging=0
-----------------------------------------
Libraries compiled on Tue Sep 13 13:28:48 2011 on node1367
Machine characteristics:
Linux-2.6.18-238.12.1.el5-x86_64-with-redhat-5.6-Tikanga
Using PETSc directory: /home/ronglian/soft/petsc-3.2-p1
Using PETSc arch: Janus-nodebug
-----------------------------------------
Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -Wall -Wno-unused-variable -O ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include
-I/home/ronglian/soft/petsc-3.2-p1/include
-I/home/ronglian/soft/petsc-3.2-p1/include
-I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include
-I/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries:
-Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib
-L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lpetsc -lX11
-Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib
-L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lsuperlu_dist_2.5
-lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis
-lHYPRE -lmpi_cxx -lstdc++ -lscalapack -lblacs -lsuperlu_4.2 -lflapack
-lfblas -L/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/lib
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal
-lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm
-lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal
-lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111006/d18464b6/attachment-0001.htm>
More information about the petsc-users
mailing list