[petsc-users] Memory optimization

Perceval Desforges perceval.desforges at polytechnique.edu
Tue Nov 26 09:23:22 CST 2019


Hello, 

This is the output of -log_view. I selected what I thought were the
important parts. I don't know if this is the best format to send the
logs. If a text file is better let me know. Thanks again, 

---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

./dos.exe on a  named compute-0-11.local with 20 processors, by pcd Tue
Nov 26 15:50:50 2019
Using Petsc Release Version 3.10.5, Mar, 28, 2019 

                         Max       Max/Min     Avg       Total 
Time (sec):           2.214e+03     1.000   2.214e+03
Objects:              1.370e+02     1.030   1.332e+02
Flop:                 1.967e+14     1.412   1.539e+14  3.077e+15
Flop/sec:             8.886e+10     1.412   6.950e+10  1.390e+12
MPI Messages:         1.716e+03     1.350   1.516e+03  3.032e+04
MPI Message Lengths:  2.559e+08     5.796   4.179e+04  1.267e+09
MPI Reductions:       3.840e+02     1.000 

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count  
%Total     Avg         %Total    Count   %Total 
 0:      Main Stage: 1.0000e+02   4.5%  3.0771e+15 100.0%  3.016e+04 
99.5%  4.190e+04       99.7%  3.310e+02  86.2% 
 1:  Setting Up EPS: 2.1137e+03  95.5%  7.4307e+09   0.0%  1.600e+02  
0.5%  2.000e+04        0.3%  4.600e+01  12.0% 

------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                     
        --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen 
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           2 1.0 2.6554e+004632.9 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
BuildTwoSidedF         3 1.0 1.2021e-01672.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot                 8 1.0 1.1364e-02 2.3 8.00e+05 1.0 0.0e+00 0.0e+00
8.0e+00  0  0  0  0  2   0  0  0  0  2  1408
VecMDot               11 1.0 4.8588e-02 2.2 6.60e+06 1.0 0.0e+00 0.0e+00
1.1e+01  0  0  0  0  3   0  0  0  0  3  2717
VecNorm               12 1.0 5.2616e-02 4.3 1.20e+06 1.0 0.0e+00 0.0e+00
1.2e+01  0  0  0  0  3   0  0  0  0  4   456
VecScale              12 1.0 9.8681e-04 2.2 6.00e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 12160
VecCopy                3 1.0 4.1175e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               108 1.0 9.3610e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                1 1.0 1.6284e-04 3.2 1.00e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 12282
VecMAXPY              12 1.0 7.6976e-03 1.9 7.70e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 20006
VecScatterBegin      419 1.0 4.5905e-01 3.7 0.00e+00 0.0 2.9e+04 3.7e+04
9.0e+01  0  0 96 85 23   0  0 97 85 27     0
VecScatterEnd        329 1.0 9.3328e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecSetRandom           1 1.0 4.3299e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          12 1.0 5.3697e-02 4.2 1.80e+06 1.0 0.0e+00 0.0e+00
1.2e+01  0  0  0  0  3   0  0  0  0  4   670
MatMult              240 1.0 1.2112e-01 1.5 1.86e+07 1.0 4.4e+02 8.0e+04
0.0e+00  0  0  1  3  0   0  0  1  3  0  3071
MatSolve             101 1.0 9.3087e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33055277
MatCholFctrNum         1 1.0 1.2752e-02 2.8 5.00e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    78
MatICCFactorSym        1 1.0 4.0321e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       5 1.7 1.2031e-01501.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         5 1.7 6.6613e-02 2.4 0.00e+00 0.0 1.6e+02 2.0e+04
2.4e+01  0  0  1  0  6   0  0  1  0  7     0
MatGetRowIJ            1 1.0 7.1526e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2271e-03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                3 1.0 2.8543e-01 1.0 0.00e+00 0.0 3.3e+02 5.6e+05
5.4e+01  0  0  1 15 14   0  0  1 15 16     0
MatView                2 0.0 7.4778e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 1.3866e-0236.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              90 1.0 9.3211e+01 1.0 1.97e+14 1.4 3.0e+04 3.6e+04
1.1e+02  4100 98 85 30  93100 99 85 34 33011509
KSPGMRESOrthog        11 1.0 5.3543e-02 2.0 1.32e+07 1.0 0.0e+00 0.0e+00
1.1e+01  0  0  0  0  3   0  0  0  0  3  4931
PCSetUp                2 1.0 1.8253e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    55
PCSetUpOnBlocks        1 1.0 1.8055e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    55
PCApply              101 1.0 9.3089e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33054820
EPSSolve               1 1.0 9.5183e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
2.4e+02  4100 97 82 63  95100 97 82 73 32327750
STApply               89 1.0 9.3107e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33048198
STMatSolve            89 1.0 9.3084e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33056525
BVCreate               2 1.0 5.0357e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  2   0  0  0  0  2     0
BVCopy                 1 1.0 9.2030e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMultVec            132 1.0 7.2259e-01 1.3 5.26e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  0  0  0  0 14567
BVMultInPlace          1 1.0 2.2316e-01 1.1 6.40e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 57357
BVDotVec             132 1.0 1.3370e+00 1.1 5.46e+08 1.0 0.0e+00 0.0e+00
1.3e+02  0  0  0  0 35   1  0  0  0 40  8169
BVOrthogonalizeV      81 1.0 1.9413e+00 1.1 1.07e+09 1.0 0.0e+00 0.0e+00
1.3e+02  0  0  0  0 35   2  0  0  0 40 11048
BVScale               89 1.0 3.0558e-03 1.4 4.45e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 29125
BVNormVec              8 1.0 1.5073e-02 1.9 1.20e+06 1.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  3   0  0  0  0  3  1592
BVSetRandom            1 1.0 4.3440e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSSolve                1 1.0 2.5339e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors             80 1.0 3.5286e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther                1 1.0 6.0797e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: Setting Up EPS

BuildTwoSidedF         3 1.0 2.8591e-0211.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 4 1.0 6.1312e-03122.5 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCholFctrSym         1 1.0 1.1540e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00  1  0  0  0  1   1  0  0  0 11     0
MatCholFctrNum         2 1.0 2.1019e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00
0.0e+00 95  0  0  0  0  99100  0  0  0     4
MatCopy                1 1.0 3.3707e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  1   0  0  0  0  4     0
MatConvert             1 1.0 6.1760e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 2.8630e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 3.2575e-02 1.1 0.00e+00 0.0 1.6e+02 2.0e+04
1.8e+01  0  0  1  0  5   0  0100100 39     0
MatGetRowIJ            1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.6703e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 1.0121e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                2 1.0 1.1354e-01 1.1 0.00e+00 0.0 1.6e+02 2.0e+04
2.0e+01  0  0  1  0  5   0  0100100 43     0
KSPSetUp               2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                2 1.0 2.1135e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00
1.2e+01 95  0  0  0  3 100100  0  0 26     4
EPSSetUp               1 1.0 2.1137e+03 1.0 1.00e+09 4.3 1.6e+02 2.0e+04
4.6e+01 95  0  1  0 12 100100100100100     4
STSetUp                2 1.0 1.0712e+03 1.0 4.95e+08 4.3 8.0e+01 2.0e+04
2.6e+01 48  0  0  0  7  51 50 50 50 57     3
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants'
Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage 

              Vector    37             50    126614208     0.
              Matrix    13             17    159831092     0.
              Viewer     6              5         4200     0.
           Index Set    12             13      2507240     0.
         Vec Scatter     5              7       128984     0.
       Krylov Solver     3              4        22776     0.
      Preconditioner     3              4         3848     0.
          EPS Solver     1              2         8632     0.
  Spectral Transform     1              2         1664     0.
       Basis Vectors     3              4        45600     0.
         PetscRandom     2              2         1292     0.
              Region     1              2         1344     0.
       Direct Solver     1              2       163856     0.

--- Event Stage 1: Setting Up EPS

              Vector    19              6       729576     0.
              Matrix    10              6     12178892     0.
           Index Set     9              8       766336     0.
         Vec Scatter     4              2         2640     0.
       Krylov Solver     1              0            0     0.
      Preconditioner     1              0            0     0.
          EPS Solver     1              0            0     0.
  Spectral Transform     1              0            0     0.
       Basis Vectors     1              0            0     0.
              Region     1              0            0     0.
       Direct Solver     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000263596
Average time for zero size MPI_Send(): 5.78523e-05
#PETSc Option Table entries:
-log_view
-mat_mumps_cntl_3 1e-12
-mat_mumps_icntl_13 1
-mat_mumps_icntl_14 60
-mat_mumps_icntl_24 1
-matload_block_size 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/share/apps/petsc/3.10.5 --with-cc=gcc
--with-cxx=g++ --with-fc=gfortran --with-debugging=0 COPTFLAGS="-O3
-march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native
-mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native"
--download-mpich --download-fblaslapack --download-scalapack
--download-mumps 

Best regards, 

Perceval, 

> On Mon, Nov 25, 2019 at 11:45 AM Perceval Desforges <perceval.desforges at polytechnique.edu> wrote: 
> 
>> I am basically trying to solve a finite element problem, which is why in 3D I have 7 non-zero diagonals that are quite farm apart from one another. In 2D I only have 5 non-zero diagonals that are less far apart. So is it normal that the set up time is around 400 times greater in the 3D case? Is there nothing to be done?
> 
> No. It is almost certain that preallocation is screwed up. There is no way it can take 400x longer for a few nonzeros. 
> 
> In order to debug, please send the output of -log_view and indicate where the time is taken for assembly. You can usually 
> track down bad preallocation using -info. 
> 
> Thanks, 
> 
> Matt  
> 
> I will try setting up only one partition. 
> 
> Thanks, 
> 
> Perceval, 
> Probably it is not a preallocation issue, as it shows "total number of mallocs used during MatSetValues calls =0".
> 
> Adding new diagonals may increase fill-in a lot, if the new diagonals are displaced with respect to the other ones.
> 
> The partitions option is intended for running several nodes. If you are using just one node probably it is better to set one partition only.
> 
> Jose
> 
> El 25 nov 2019, a las 18:25, Matthew Knepley <knepley at gmail.com> escribió:
> 
> On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges <perceval.desforges at polytechnique.edu> wrote:
> Hi,
> 
> So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got:
> 
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1000000, cols=1000000
> total: nonzeros=7000000, allocated nonzeros=7000000
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> 
> 20 times, and then
> 
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1000000, cols=1000000
> total: nonzeros=1000000, allocated nonzeros=1000000
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> 
> 20 times as well, and then
> 
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1000000, cols=1000000
> total: nonzeros=7000000, allocated nonzeros=7000000
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> 
> 20 times as well before crashing.
> 
> I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues.
> 
> I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds)  and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries)  and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much?
> 
> I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create
> your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the
> preallocation correctly.
> 
> Thanks,
> 
> Matt 
> Thanks again,
> 
> Best regards,
> 
> Perceval,
> 
> Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with
> -mat_view ::ascii_info
> 
> Jose
> 
> El 22 nov 2019, a las 19:56, Perceval Desforges <perceval.desforges at polytechnique.edu> escribió:
> 
> Hi,
> 
> Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error:
> 
> slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed
> 
> I get this error even when there are no eigenvalues in the interval.
> 
> I've started using BVMAT instead of BVVECS by the way.
> 
> Thanks,
> 
> Perceval,
> 
> Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS.
> 
> Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example:
> http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html
> You can comment out the call to EPSSolve() and run with the option -show_inertias
> For example, the output
> Shift 0.1  Inertia 3 
> Shift 0.35  Inertia 11 
> means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3).
> 
> By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower).
> 
> Jose
> 
> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users <petsc-users at mcs.anl.gov> escribió:
> 
> Hello all,
> 
> I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM.
> 
> The options I use are :
> 
> -bv_type vecs  -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12
> 
> However the program quickly crashes with this error:
> 
> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed
> 
> I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error:
> 
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614
> 
> which is an error due to setting the mumps icntl option so low from what I've gathered.
> 
> Is there any other way I can reduce memory usage?
> 
> Thanks,
> 
> Regards,
> 
> Perceval,
> 
> P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice.

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191126/802f718e/attachment-0001.html>


More information about the petsc-users mailing list