[petsc-users] Memory per core - parallel eigenvalue problem

venkatesh g venkateshgk.j at gmail.com
Mon May 25 08:18:33 CDT 2015


Hi,

I am solving the generalized eigen problem Ax=lambda Bx given in EX7 in
Slepc manual.

Now I have restructured my matrix and I get reasonable answers.

However, when I submit the problem in multiple cores, it is accessing a
large amount of memory than a serial job.

1 Serial job  - Amount of memory used = 18 GB

Parallel job in 240 cores - Amount of memory used = 456 GB

I have attached the performance log for the run.

I think it is loading the full matrix in each core for a parallel job, but
I am not sure how to resolve or whether I am correct..

Kindly help.

cheers,
Venkatesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150525/6f9473dd/attachment.html>
-------------- next part --------------

Generalized eigenproblem stored in file.

 Reading COMPLEX matrices from binary files...
 Number of iterations of the method: 10
 Number of linear iterations of the method: 99
 Solution method: krylovschur

 Number of requested eigenvalues: 3
 Stopping condition: tol=1e-08, maxit=1600
 Number of converged approximate eigenpairs: 9

           k          ||Ax-kBx||/||kx||
   ----------------- ------------------
 -0.009606+0.000000 i   0.00350922
 -0.009633+0.000000 i    0.0107094
 -0.010251-0.000000 i    0.0104596
 -0.010478+0.000000 i   0.00103213
 -0.011009-0.000000 i  0.000354202
 -0.013199-0.000000 i   0.00022781
 -0.017263+0.000000 i  5.39603e-05
 -0.023652-0.000000 i  2.79115e-05
 -0.032886+0.000000 i  3.21621e-05

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ex7 on a linux-intel named nid01080 with 240 processors, by esdveng Mon May 25 07:39:49 2015
Using Petsc Release Version 3.5.3, Jan, 31, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.837e+03      1.00001   1.837e+03
Objects:              1.070e+02      1.00000   1.070e+02
Flops:                6.655e+11      1.00014   6.654e+11  1.597e+14
Flops/sec:            3.622e+08      1.00014   3.621e+08  8.691e+10
Memory:               1.927e+09      1.00475              4.613e+11
MPI Messages:         3.200e+04      1.30610   2.617e+04  6.281e+06
MPI Message Lengths:  2.558e+08     11.04369   1.012e+03  6.357e+09
MPI Reductions:       2.876e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.8374e+03 100.0%  1.5969e+14 100.0%  6.281e+06 100.0%  1.012e+03      100.0%  2.875e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

ThreadCommRunKer       1 1.0 1.0970e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ThreadCommBarrie       1 1.0 4.0531e-06 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              117 1.0 2.2459e-01 4.5 9.08e+07519.3 4.3e+05 9.2e+02 0.0e+00  0  0  7  6  0   0  0  7  6  0 25951
MatSolve              99 1.0 1.0297e+02 1.1 3.91e+10 1.0 0.0e+00 0.0e+00 0.0e+00  5  6  0  0  0   5  6  0  0  0 91120
MatLUFactorSym         1 1.0 1.0519e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatLUFactorNum         1 1.0 1.7081e+03 1.0 6.26e+11 1.0 0.0e+00 0.0e+00 0.0e+00 91 94  0  0  0  91 94  0  0  0 87994
MatConvert             2 1.0 6.9909e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       4 1.0 3.7676e+0031.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         4 1.0 7.5902e-01 1.0 0.00e+00 0.0 5.9e+04 1.2e+02 4.2e+01  0  0  1  0  1   0  0  1  0  1     0
MatGetRow            120 1.0 4.4448e-03 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 6.6685e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 1.6402e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.6562e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                2 1.0 6.0908e+00 1.0 0.00e+00 0.0 3.2e+04 1.5e+04 4.6e+01  0  0  1  7  2   0  0  1  7  2     0
MatAXPY                1 1.0 2.2365e-01 1.0 0.00e+00 0.0 2.9e+04 1.2e+02 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
MatGetRedundant        1 1.0 2.3025e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMPIConcateSeq       1 1.0 6.6215e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNorm               18 1.0 2.5502e-02 1.1 8.64e+03 1.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  1   0  0  0  0  1    81
VecCopy               20 1.0 1.6141e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               213 1.0 2.8761e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                9 1.0 3.1037e-02 1.1 4.32e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    33
VecScatterBegin      315 1.0 1.4699e-01 1.3 0.00e+00 0.0 6.1e+06 9.6e+02 0.0e+00  0  0 97 92  0   0  0 97 92  0     0
VecScatterEnd        315 1.0 7.7702e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
EPSSetUp               1 1.0 1.7233e+03 1.0 6.26e+11 1.0 1.4e+05 3.1e+02 7.9e+01 92 94  2  1  3  92 94  2  1  3 87218
EPSSolve               1 1.0 1.8280e+03 1.0 6.65e+11 1.0 6.1e+06 9.4e+02 2.7e+03 99100 97 91 93  99100 97 91 93 87358
STSetUp                1 1.0 1.7233e+03 1.0 6.26e+11 1.0 1.4e+05 3.1e+02 5.8e+01 92 94  2  1  2  92 94  2  1  2 87218
STApply               99 1.0 1.0423e+02 1.1 3.92e+10 1.0 6.0e+06 9.6e+02 2.0e+02  6  6 95 90  7   6  6 95 90  7 90059
STMatSolve            99 1.0 1.0411e+02 1.1 3.91e+10 1.0 5.7e+06 9.6e+02 2.0e+02  6  6 90 86  7   6  6 90 86  7 90121
BVCopy                19 1.0 2.6266e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.6e+01  0  0  0  0  2   0  0  0  0  2     0
BVMult               198 1.0 3.0988e-02 1.0 2.01e+06 1.0 0.0e+00 0.0e+00 3.7e+02  0  0  0  0 13   0  0  0  0 13 15584
BVDot                187 1.0 5.9229e-02 1.0 1.29e+06 1.0 0.0e+00 0.0e+00 1.9e+02  0  0  0  0  7   0  0  0  0  7  5209
BVOrthogonalize      100 1.0 1.3423e-01 1.0 2.48e+06 1.0 0.0e+00 0.0e+00 1.5e+03  0  0  0  0 52   0  0  0  0 52  4436
BVScale              100 1.0 6.8860e-03 1.1 2.40e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   836
BVSetRandom            1 1.0 4.3875e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSSolve               10 1.0 2.4658e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors             20 1.0 3.8042e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther               10 1.0 1.1059e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              99 1.0 1.0409e+02 1.1 3.91e+10 1.0 5.7e+06 9.6e+02 2.0e+02  6  6 90 86  7   6  6 90 86  7 90140
PCSetUp                1 1.0 1.7230e+03 1.0 6.26e+11 1.0 1.2e+05 3.6e+02 2.8e+01 92 94  2  1  1  92 94  2  1  1 87232
PCApply               99 1.0 1.0400e+02 1.1 3.91e+10 1.0 5.7e+06 9.6e+02 0.0e+00  6  6 90 86  0   6  6 90 86  0 90218
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     3              2         1504     0
              Matrix    17             17   1915794800     0
              Vector    56             56       913984     0
      Vector Scatter     6              6         7320     0
           Index Set    15             15       254960     0
Eigenvalue Problem Solver     1              1         2156     0
         PetscRandom     1              1          648     0
  Spectral Transform     1              1          840     0
       Basis Vectors     1              1        10744     0
              Region     1              1          648     0
       Direct solver     1              1        25272     0
       Krylov Solver     2              2         2320     0
      Preconditioner     2              2         1928     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.54018e-05
Average time for zero size MPI_Send(): 4.47432e-06
#PETSc Option Table entries:
-eps_nev 3
-eps_target 0.5
-f1 a40t
-f2 b40t
-log_summary
-st_type sinvert
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 PETSC_ARCH=linux-intel -with-blas-lapack-dir=/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/ --with-mpi-dir=/opt/cray/mpt/7.0.5/gni/mpich2-intel/140 --with-scalar-type=complex --with-fortran-kernels=1 -known-mpi-shared-libraries=0 --with-large-file-io=1 --with-64-bit-indices=0 --with-batch FC=ifort --with-valgrind-dir=/home/proj/14/esdveng/apps/valgrind-3.10.1 --download-mumps=/home/proj/14/esdveng/apps/MUMPS_4.10.0-p3.tar.gz --download-scalapack=/home/proj/14/esdveng/apps/scalapack-2.0.2.tgz --download-blacs=/home/proj/14/esdveng/apps/blacs-dev.tar.gz --download-parmetis=/home/proj/14/esdveng/apps/parmetis-4.0.2-p5.tar.gz --download-metis=/home/proj/14/esdveng/apps/metis-5.0.2-p3.tar.gz --download-cmake=/home/proj/14/esdveng/apps/cmake-2.8.12.2.tar.gz
-----------------------------------------
Libraries compiled on Tue May 19 06:39:06 2015 on login4 
Machine characteristics: Linux-3.0.101-0.31.1_1.0502.8394-cray_ari_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /mnt/lustre/esd2/esdveng/petsc-3.5.3
Using PETSc arch: linux-intel
-----------------------------------------

Using C compiler: gcc  -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ifort  -fPIC -g   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/include -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/include -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/include -I/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/include -I/opt/cray/mpt/7.0.5/gni/mpich2-intel/140/include
-----------------------------------------

Using C linker: gcc
Using Fortran linker: ifort
Using libraries: -Wl,-rpath,/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -L/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -lpetsc -Wl,-rpath,/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -L/mnt/lustre/esd2/esdveng/petsc-3.5.3/linux-intel/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lparmetis -lmetis -lX11 -lpthread -lssl -lcrypto -Wl,-rpath,/opt/cray/mpt/7.0.5/gni/mpich2-intel/140/lib -L/opt/cray/mpt/7.0.5/gni/mpich2-intel/140/lib -lmpich -lssl -luuid -lpthread -lrt -ldl -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -L/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lpthread -lirc_s -lm -lstdc++ -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -L/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -lstdc++ -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -L/opt/intel/composer_xe_2015.1.133/tbb/lib/intel64/gcc4.1 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -ldl -lgcc_s -ldl  
-----------------------------------------

Application 404794 resources: utime ~438984s, stime ~1036s, Rss ~1905660, inblocks ~1047499, outblocks ~33037


More information about the petsc-users mailing list