[petsc-users] Memory per core - parallel eigenvalue problem
venkatesh g
venkateshgk.j at gmail.com
Mon May 25 08:18:33 CDT 2015
Hi,
I am solving the generalized eigen problem Ax=lambda Bx given in EX7 in
Slepc manual.
Now I have restructured my matrix and I get reasonable answers.
However, when I submit the problem in multiple cores, it is accessing a
large amount of memory than a serial job.
1 Serial job - Amount of memory used = 18 GB
Parallel job in 240 cores - Amount of memory used = 456 GB
I have attached the performance log for the run.
I think it is loading the full matrix in each core for a parallel job, but
I am not sure how to resolve or whether I am correct..
Kindly help.
cheers,
Venkatesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150525/6f9473dd/attachment.html>
-------------- next part --------------
Generalized eigenproblem stored in file.
Reading COMPLEX matrices from binary files...
Number of iterations of the method: 10
Number of linear iterations of the method: 99
Solution method: krylovschur
Number of requested eigenvalues: 3
Stopping condition: tol=1e-08, maxit=1600
Number of converged approximate eigenpairs: 9
k ||Ax-kBx||/||kx||
----------------- ------------------
-0.009606+0.000000 i 0.00350922
-0.009633+0.000000 i 0.0107094
-0.010251-0.000000 i 0.0104596
-0.010478+0.000000 i 0.00103213
-0.011009-0.000000 i 0.000354202
-0.013199-0.000000 i 0.00022781
-0.017263+0.000000 i 5.39603e-05
-0.023652-0.000000 i 2.79115e-05
-0.032886+0.000000 i 3.21621e-05
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex7 on a linux-intel named nid01080 with 240 processors, by esdveng Mon May 25 07:39:49 2015
Using Petsc Release Version 3.5.3, Jan, 31, 2015
Max Max/Min Avg Total
Time (sec): 1.837e+03 1.00001 1.837e+03
Objects: 1.070e+02 1.00000 1.070e+02
Flops: 6.655e+11 1.00014 6.654e+11 1.597e+14
Flops/sec: 3.622e+08 1.00014 3.621e+08 8.691e+10
Memory: 1.927e+09 1.00475 4.613e+11
MPI Messages: 3.200e+04 1.30610 2.617e+04 6.281e+06
MPI Message Lengths: 2.558e+08 11.04369 1.012e+03 6.357e+09
MPI Reductions: 2.876e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.8374e+03 100.0% 1.5969e+14 100.0% 6.281e+06 100.0% 1.012e+03 100.0% 2.875e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
ThreadCommRunKer 1 1.0 1.0970e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ThreadCommBarrie 1 1.0 4.0531e-06 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 117 1.0 2.2459e-01 4.5 9.08e+07519.3 4.3e+05 9.2e+02 0.0e+00 0 0 7 6 0 0 0 7 6 0 25951
MatSolve 99 1.0 1.0297e+02 1.1 3.91e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 6 0 0 0 5 6 0 0 0 91120
MatLUFactorSym 1 1.0 1.0519e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatLUFactorNum 1 1.0 1.7081e+03 1.0 6.26e+11 1.0 0.0e+00 0.0e+00 0.0e+00 91 94 0 0 0 91 94 0 0 0 87994
MatConvert 2 1.0 6.9909e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 4 1.0 3.7676e+0031.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 4 1.0 7.5902e-01 1.0 0.00e+00 0.0 5.9e+04 1.2e+02 4.2e+01 0 0 1 0 1 0 0 1 0 1 0
MatGetRow 120 1.0 4.4448e-03 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 6.6685e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 1 1.0 1.6402e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.6562e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLoad 2 1.0 6.0908e+00 1.0 0.00e+00 0.0 3.2e+04 1.5e+04 4.6e+01 0 0 1 7 2 0 0 1 7 2 0
MatAXPY 1 1.0 2.2365e-01 1.0 0.00e+00 0.0 2.9e+04 1.2e+02 2.2e+01 0 0 0 0 1 0 0 0 0 1 0
MatGetRedundant 1 1.0 2.3025e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMPIConcateSeq 1 1.0 6.6215e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNorm 18 1.0 2.5502e-02 1.1 8.64e+03 1.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 1 0 0 0 0 1 81
VecCopy 20 1.0 1.6141e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 213 1.0 2.8761e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 9 1.0 3.1037e-02 1.1 4.32e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 33
VecScatterBegin 315 1.0 1.4699e-01 1.3 0.00e+00 0.0 6.1e+06 9.6e+02 0.0e+00 0 0 97 92 0 0 0 97 92 0 0
VecScatterEnd 315 1.0 7.7702e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
EPSSetUp 1 1.0 1.7233e+03 1.0 6.26e+11 1.0 1.4e+05 3.1e+02 7.9e+01 92 94 2 1 3 92 94 2 1 3 87218
EPSSolve 1 1.0 1.8280e+03 1.0 6.65e+11 1.0 6.1e+06 9.4e+02 2.7e+03 99100 97 91 93 99100 97 91 93 87358
STSetUp 1 1.0 1.7233e+03 1.0 6.26e+11 1.0 1.4e+05 3.1e+02 5.8e+01 92 94 2 1 2 92 94 2 1 2 87218
STApply 99 1.0 1.0423e+02 1.1 3.92e+10 1.0 6.0e+06 9.6e+02 2.0e+02 6 6 95 90 7 6 6 95 90 7 90059
STMatSolve 99 1.0 1.0411e+02 1.1 3.91e+10 1.0 5.7e+06 9.6e+02 2.0e+02 6 6 90 86 7 6 6 90 86 7 90121
BVCopy 19 1.0 2.6266e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.6e+01 0 0 0 0 2 0 0 0 0 2 0
BVMult 198 1.0 3.0988e-02 1.0 2.01e+06 1.0 0.0e+00 0.0e+00 3.7e+02 0 0 0 0 13 0 0 0 0 13 15584
BVDot 187 1.0 5.9229e-02 1.0 1.29e+06 1.0 0.0e+00 0.0e+00 1.9e+02 0 0 0 0 7 0 0 0 0 7 5209
BVOrthogonalize 100 1.0 1.3423e-01 1.0 2.48e+06 1.0 0.0e+00 0.0e+00 1.5e+03 0 0 0 0 52 0 0 0 0 52 4436
BVScale 100 1.0 6.8860e-03 1.1 2.40e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 836
BVSetRandom 1 1.0 4.3875e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSSolve 10 1.0 2.4658e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSVectors 20 1.0 3.8042e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSOther 10 1.0 1.1059e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 2 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 99 1.0 1.0409e+02 1.1 3.91e+10 1.0 5.7e+06 9.6e+02 2.0e+02 6 6 90 86 7 6 6 90 86 7 90140
PCSetUp 1 1.0 1.7230e+03 1.0 6.26e+11 1.0 1.2e+05 3.6e+02 2.8e+01 92 94 2 1 1 92 94 2 1 1 87232
PCApply 99 1.0 1.0400e+02 1.1 3.91e+10 1.0 5.7e+06 9.6e+02 0.0e+00 6 6 90 86 0 6 6 90 86 0 90218
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 3 2 1504 0
Matrix 17 17 1915794800 0
Vector 56 56 913984 0
Vector Scatter 6 6 7320 0
Index Set 15 15 254960 0
Eigenvalue Problem Solver 1 1 2156 0
PetscRandom 1 1 648 0
Spectral Transform 1 1 840 0
Basis Vectors 1 1 10744 0
Region 1 1 648 0
Direct solver 1 1 25272 0
Krylov Solver 2 2 2320 0
Preconditioner 2 2 1928 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.54018e-05
Average time for zero size MPI_Send(): 4.47432e-06
#PETSc Option Table entries:
-eps_nev 3
-eps_target 0.5
-f1 a40t
-f2 b40t
-log_summary
-st_type sinvert
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 PETSC_ARCH=linux-intel -with-blas-lapack-dir=/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/ --with-mpi-dir=/opt/cray/mpt/7.0.5/gni/mpich2-intel/140 --with-scalar-type=complex --with-fortran-kernels=1 -known-mpi-shared-libraries=0 --with-large-file-io=1 --with-64-bit-indices=0 --with-batch FC=ifort --with-valgrind-dir=/home/proj/14/esdveng/apps/valgrind-3.10.1 --download-mumps=/home/proj/14/esdveng/apps/MUMPS_4.10.0-p3.tar.gz --download-scalapack=/home/proj/14/esdveng/apps/scalapack-2.0.2.tgz --download-blacs=/home/proj/14/esdveng/apps/blacs-dev.tar.gz --download-parmetis=/home/proj/14/esdveng/apps/parmetis-4.0.2-p5.tar.gz --download-metis=/home/proj/14/esdveng/apps/metis-5.0.2-p3.tar.gz --download-cmake=/home/proj/14/esdveng/apps/cmake-2.8.12.2.tar.gz
-----------------------------------------
Libraries compiled on Tue May 19 06:39:06 2015 on login4
Machine characteristics: Linux-3.0.101-0.31.1_1.0502.8394-cray_ari_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /mnt/lustre/esd2/esdveng/petsc-3.5.3
Using PETSc arch: linux-intel
-----------------------------------------
Using C compiler: gcc -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ifort -fPIC -g ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/include -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/include -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/include -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/include -I/opt/cray/mpt/7.0.5/gni/mpich2-intel/140/include
-----------------------------------------
Using C linker: gcc
Using Fortran linker: ifort
Using libraries: -Wl,-rpath,/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -L/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -lpetsc -Wl,-rpath,/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -L/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lparmetis -lmetis -lX11 -lpthread -lssl -lcrypto -Wl,-rpath,/opt/cray/mpt/7.0.5/gni/mpich2-intel/140/lib -L/opt/cray/mpt/7.0.5/gni/mpich2-intel/140/lib -lmpich -lssl -luuid -lpthread -lrt -ldl -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -L/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lpthread -lirc_s -lm -lstdc++ -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -L/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -lstdc++ -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -L/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -ldl -lgcc_s -ldl
-----------------------------------------
Application 404794 resources: utime ~438984s, stime ~1036s, Rss ~1905660, inblocks ~1047499, outblocks ~33037
More information about the petsc-users
mailing list