[petsc-users] SLEPc EPSGD: too much time in single iteration

Runfeng Jin jsfaraway at gmail.com
Wed Jun 15 01:58:32 CDT 2022


Sorry ,I miss the attachment.

Runfeng Jin

Runfeng Jin <jsfaraway at gmail.com> 于2022年6月15日周三 14:56写道:

> Hi! You are right!  I try to use a SLEPc and PETSc version with nodebug,
> and the matrix B's solver time become 99s. But It is still a little higher
> than matrix A(8s). Same as mentioned before, attachment is log view of
> no-debug version:
>    file 1:  log of matrix A solver. This is a larger
> matrix(900,000*900,000) but solved quickly(8s);
>    file 2: log of matix B solver. This is a smaller matrix(2,547*2,547)
> but solved much slower(99s).
>
> By comparing these two files,  the strang phenomenon still exist:
> 1) Matrix A has more basis vectors(375) than B(189), but A spent less time
> on BVCreate(0.6s) than B(32s);
> 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)
> 3) In debug version, matrix B distribute much more unbalancedly storage
> among processors(memory max/min 4365) than A(memory max/min 1.113), but
> other metrics seems more balanced. And in no-debug version there is no
> memory information output.
>
> The significant difference I can tell is :1) B use preallocation; 2) A's
> matrix elements are calculated by CPU, while B's matrix elements are
> calculated by GPU and then transfered to CPU and solved by PETSc in CPU.
>
> Does this is a normal result? I mean, the matrix with less non-zero
> elements and less dimension can cost more epssolve time? Is this due to the
> structure of matrix? IF so, is there any ways to increase the solve speed?
>
> Or this is weired and should  be fixed by some ways?
> Thank you!
>
> Runfeng Jin
>
>
> Jose E. Roman <jroman at dsic.upv.es> 于2022年6月12日周日 16:08写道:
>
>> Please always respond to the list.
>>
>> Pay attention to the warnings in the log:
>>
>>       ##########################################################
>>       #                                                        #
>>       #                       WARNING!!!                       #
>>       #                                                        #
>>       #   This code was compiled with a debugging option.      #
>>       #   To get timing results run ./configure                #
>>       #   using --with-debugging=no, the performance will      #
>>       #   be generally two or three times faster.              #
>>       #                                                        #
>>       ##########################################################
>>
>> With the debugging option the times are not trustworthy, so I suggest
>> repeating the analysis with an optimized build.
>>
>> Jose
>>
>>
>> > El 12 jun 2022, a las 5:41, Runfeng Jin <jsfaraway at gmail.com> escribió:
>> >
>> > Hello!
>> >  I compare these two matrix solver's log view and find some strange
>> thing. Attachment files are the log view.:
>> >    file 1:  log of matrix A solver. This is a larger
>> matrix(900,000*900,000) but solved quickly(30s);
>> >    file 2: log of matix B solver. This is a smaller matrix(2,547*2,547
>> , a little different from the matrix B that is mentioned in initial email,
>> but solved much slower too. I use this for a quicker test) but solved much
>> slower(1244s).
>> >
>> > By comparing these two files, I find some thing:
>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
>> time on BVCreate(0.349s) than B(296s);
>> > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)
>> > 3) Matrix B distribute much more unbalancedly storage among
>> processors(memory max/min 4365) than A(memory max/min 1.113), but other
>> metrics seems more balanced.
>> >
>> > I don't do prealocation in A, and it is distributed across processors
>> by PETSc. For B , when preallocation I use PetscSplitOwnership to decide
>> which part belongs to local processor, and B is also distributed by PETSc
>> when compute matrix values.
>> >
>> > - Does this mean, for matrix B, too much nonzero elements are stored in
>> single process, and this is why it cost too much more time in solving the
>> matrix and find eigenvalues? If so,  are there some better ways to
>> distribute the matrix among processors?
>> > - Or are there any else reasons for this difference in cost time?
>> >
>> > Hope to recieve your reply, thank you!
>> >
>> > Runfeng Jin
>> >
>> >
>> >
>> > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月11日周六 20:33写道:
>> > Hello!
>> > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time.
>> Is there anything else I can do? Attachment is log when use PETSC_DEFAULT
>> for eps_ncv.
>> >
>> > Thank you !
>> >
>> > Runfeng Jin
>> >
>> > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月10日周五 20:50写道:
>> > The value -eps_ncv 5000 is huge.
>> > Better let SLEPc use the default value.
>> >
>> > Jose
>> >
>> >
>> > > El 10 jun 2022, a las 14:24, Jin Runfeng <jsfaraway at gmail.com>
>> escribió:
>> > >
>> > > Hello!
>> > >  I want to acquire the 3 smallest eigenvalue, and attachment is the
>> log  view output. I can see epssolve really cost the major time. But I can
>> not see why it cost so much time. Can you see something from it?
>> > >
>> > > Thank you !
>> > >
>> > > Runfeng Jin
>> > >
>> > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <jroman at dsic.upv.es> wrote:
>> > > Convergence depends on distribution of eigenvalues you want to
>> compute. On the other hand, the cost also depends on the time it takes to
>> build the preconditioner. Use -log_view to see the cost of the different
>> steps of the computation.
>> > >
>> > > Jose
>> > >
>> > >
>> > > > El 3 jun 2022, a las 18:50, jsfaraway <jsfaraway at gmail.com>
>> escribió:
>> > > >
>> > > > hello!
>> > > >
>> > > > I am trying to use epsgd compute matrix's one smallest eigenvalue.
>> And I find a strang thing. There are two matrix A(900000*900000) and
>> B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B
>> use 22 iterations and 38885s! What could be the reason for this? Or what
>> can I do to find the reason?
>> > > >
>> > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ".
>> > > > And there is one difference I can tell is matrix B has many small
>> value, whose absolute value is less than 10-6. Could this be the reason?
>> > > >
>> > > > Thank you!
>> > > >
>> > > > Runfeng Jin
>> > > <log_view.txt>
>> >
>> > <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220615/897e90a8/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/public/home/jrf/works/ecMRCI-shaula/MRCI on a  named g16r3n07 with 256 processors, by jrf Wed Jun 15 10:04:00 2022
Using Petsc Release Version 3.15.1, Jun 17, 2021 

                         Max       Max/Min     Avg       Total
Time (sec):           1.029e+02     1.001   1.028e+02
Objects:              2.011e+03     1.146   1.761e+03
Flop:                 1.574e+06     2.099   1.104e+06  2.827e+08
Flop/sec:             1.531e+04     2.099   1.074e+04  2.748e+06
MPI Messages:         3.881e+04     7.920   1.865e+04  4.773e+06
MPI Message Lengths:  1.454e+06     6.190   3.542e+01  1.691e+08
MPI Reductions:       1.791e+03     1.001

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.0285e+02 100.0%  2.8266e+08 100.0%  4.773e+06 100.0%  3.542e+01      100.0%  1.769e+03  98.9%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          2 1.0 4.0572e-01 2.6 0.00e+00 0.0 3.7e+04 4.0e+00 2.0e+00  0  0  1  0  0   0  0  1  0  0     0
BuildTwoSidedF         1 1.0 2.0986e-01 2.6 0.00e+00 0.0 2.4e+04 1.1e+02 1.0e+00  0  0  1  2  0   0  0  1  2  0     0
MatMult              193 1.0 4.6531e+00 1.1 9.85e+05 4.3 4.7e+06 3.5e+01 1.0e+00  4 48 99 98  0   4 48 99 98  0    29
MatSolve             377 1.0 1.6183e-0288.8 7.16e+04 6.6 0.0e+00 0.0e+00 0.0e+00  0  6  0  0  0   0  6  0  0  0   982
MatLUFactorNum         1 1.0 5.7322e-05 2.8 6.21e+0222.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2177
MatILUFactorSym        1 1.0 8.5668e-03793.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 2.1006e-01 2.6 0.00e+00 0.0 2.4e+04 1.1e+02 1.0e+00  0  0  1  2  0   0  0  1  2  0     0
MatAssemblyEnd         1 1.0 3.0272e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 7.0000e-07 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 5.3758e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        98 1.0 9.0806e-0371.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNorm                3 1.0 1.8633e-01 1.8 6.00e+01 1.1 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecCopy              959 1.0 1.4909e-0275.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               387 1.0 9.8578e-04 8.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                3 1.0 1.6157e-023639.0 6.00e+01 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
VecScatterBegin      196 1.0 3.7129e-01 1.5 0.00e+00 0.0 4.7e+06 3.5e+01 4.0e+00  0  0 99 98  0   0  0 99 98  0     0
VecScatterEnd        196 1.0 4.6171e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
VecSetRandom           3 1.0 3.5271e-0515.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith       634 1.0 1.7589e-0265.9 1.20e+04 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   174
VecReduceComm        444 1.0 2.4972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.4e+02 24  0  0  0 25  24  0  0  0 25     0
SFSetGraph             1 1.0 1.1170e-05 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                4 1.0 2.3891e-01 1.3 0.00e+00 0.0 4.9e+04 1.1e+01 1.0e+00  0  0  1  0  0   0  0  1  0  0     0
SFPack               196 1.0 1.2023e-0291.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             196 1.0 4.3491e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
EPSSetUp               1 1.0 9.6815e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+01  1  0  0  0  1   1  0  0  0  1     0
EPSSolve               1 1.0 9.9906e+01 1.0 1.56e+06 2.1 4.7e+06 3.5e+01 1.7e+03 97 99 98 97 97  97 99 98 97 99     3
STSetUp                1 1.0 2.8679e-0450.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
STComputeOperatr       1 1.0 2.0985e-04223.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVCreate             194 1.0 3.2437e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.8e+02 31  0  0  0 33  31  0  0  0 33     0
BVCopy               386 1.0 1.7107e-02110.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMultVec           1090 1.0 1.8337e-0221.1 2.22e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0 20  0  0  0   0 20  0  0  0  3080
BVMultInPlace        224 1.0 1.8273e-0218.4 1.06e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0 10  0  0  0   0 10  0  0  0  1481
BVDot                319 1.0 1.7687e+01 1.1 1.11e+05 1.1 0.0e+00 0.0e+00 3.2e+02 17 10  0  0 18  17 10  0  0 18     2
BVDotVec             392 1.0 2.2083e+01 1.0 6.32e+04 1.1 0.0e+00 0.0e+00 3.9e+02 21  6  0  0 22  21  6  0  0 22     1
BVOrthogonalizeV     190 1.0 1.1538e+01 1.0 1.15e+05 1.1 0.0e+00 0.0e+00 2.0e+02 11 10  0  0 11  11 10  0  0 12     3
BVScale              254 1.0 1.7301e-02125.1 2.54e+03 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
BVSetRandom            3 1.0 3.6330e-0485.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMatProject         255 1.0 1.7707e+01 1.1 1.11e+05 1.1 0.0e+00 0.0e+00 3.2e+02 17 10  0  0 18  17 10  0  0 18     2
DSSolve               82 1.0 5.4953e-0215.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors            380 1.0 9.7683e-0366.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther              179 1.0 1.7321e-0239.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 1.8680e-0574.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             377 1.0 1.8723e-0214.2 7.16e+04 6.6 0.0e+00 0.0e+00 0.0e+00  0  6  0  0  0   0  6  0  0  0   849
PCSetUp                2 1.0 8.8937e-0353.4 6.21e+0222.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    14
PCApply              377 1.0 3.0085e-0213.8 7.23e+04 6.6 0.0e+00 0.0e+00 0.0e+00  0  6  0  0  0   0  6  0  0  0   532
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix   745            745      2664464     0.
              Vector   793            793      1538360     0.
           Index Set    10             10        10792     0.
   Star Forest Graph     4              4         5376     0.
          EPS Solver     1              1         3468     0.
  Spectral Transform     1              1          908     0.
       Basis Vectors   195            195       437744     0.
              Region     1              1          680     0.
       Direct Solver     1              1        20156     0.
       Krylov Solver     2              2         3200     0.
      Preconditioner     2              2         1936     0.
         PetscRandom     1              1          670     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 4.7e-08
Average time for MPI_Barrier(): 0.0578456

Average time for zero size MPI_Send(): 0.00358668
#PETSc Option Table entries:
-eps_gd_blocksize 3
-eps_gd_initial_size 3
-eps_ncv PETSC_DEFAULT
-eps_type gd
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack=1 --with-blaslapack-dir=/public/software/compiler/intel/oneapi/mkl/2021.3.0 --with-64-bit-blas-indices=0 --with-boost=1 --with-boost-dir=/public/home/jrf/tools/boost_1_73_0/gcc7.3.1 --prefix=/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug --with-valgrind-dir=/public/home/jrf/tools/valgrind --LDFLAGS=-Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib --with-64-bit-indices=0 --with-petsc-arch=gcc7.3.1-32indices-nodebug --with-debugging=no
-----------------------------------------
Libraries compiled on 2022-06-14 01:43:59 on login05 
Machine characteristics: Linux-3.10.0-957.el7.x86_64-x86_64-with-centos
Using PETSc directory: /public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug
Using PETSc arch: 
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O   
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O     
-----------------------------------------

Using include paths: -I/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/include -I/public/home/jrf/tools/boost_1_73_0/gcc7.3.1/include -I/public/home/jrf/tools/valgrind/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -L/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -lpetsc -Wl,-rpath,/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -L/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -Wl,-rpath,/opt/hpc/software/mpi/hwloc/lib -L/opt/hpc/software/mpi/hwloc/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -L/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib64 -L/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib -L/opt/rh/devtoolset-7/root/usr/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/public/home/jrf/works/qubic/bin/pfci.x on a  named h09r4n13 with 192 processors, by jrf Wed Jun 15 12:10:57 2022
Using Petsc Release Version 3.15.1, Jun 17, 2021 

                         Max       Max/Min     Avg       Total
Time (sec):           9.703e+02     1.000   9.703e+02
Objects:              2.472e+03     1.000   2.472e+03
Flop:                 6.278e+09     1.064   6.012e+09  1.154e+12
Flop/sec:             6.470e+06     1.064   6.196e+06  1.190e+09
MPI Messages:         3.635e+04     1.947   2.755e+04  5.290e+06
MPI Message Lengths:  7.246e+08     1.742   2.052e+04  1.085e+11
MPI Reductions:       2.464e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 9.7032e+02 100.0%  1.1543e+12 100.0%  5.290e+06 100.0%  2.052e+04      100.0%  2.446e+03  99.3%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          2 1.0 1.9883e+029876.1 0.00e+00 0.0 2.1e+04 4.0e+00 2.0e+00 11  0  0  0  0  11  0  0  0  0     0
BuildTwoSidedF         1 1.0 1.9879e+021349804.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 11  0  0  0  0  11  0  0  0  0     0
MatMult              247 1.0 2.5963e+00 1.6 1.16e+09 1.2 5.3e+06 2.1e+04 1.0e+00  0 17100100  0   0 17100100  0 77449
MatSolve             479 1.0 3.2541e-01 2.3 3.89e+08 2.2 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 146312
MatLUFactorNum         1 1.0 4.3923e-02 7.0 2.24e+07 4.9 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 41413
MatILUFactorSym        1 1.0 2.5215e-03 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 1.9879e+02654719.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 11  0  0  0  0  11  0  0  0  0     0
MatAssemblyEnd         1 1.0 2.1247e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 8.3000e-07 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.0741e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries       244 1.0 2.5375e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNorm                3 1.0 1.3600e-0125.4 2.83e+04 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0    40
VecCopy             1214 1.0 6.8012e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               486 1.0 2.4261e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                3 1.0 1.5987e-04 3.8 2.83e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 34009
VecScatterBegin      247 1.0 3.8039e-01 2.2 0.00e+00 0.0 5.3e+06 2.1e+04 1.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd        247 1.0 1.3181e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           6 1.0 1.3014e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith       723 1.0 5.9514e-03 2.1 6.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 220153
VecReduceComm        482 1.0 2.1629e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+02  0  0  0  0 20   0  0  0  0 20     0
SFSetGraph             1 1.0 1.3207e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                1 1.0 8.3540e-02 1.4 0.00e+00 0.0 4.2e+04 5.2e+03 1.0e+00  0  0  1  0  0   0  0  1  0  0     0
SFPack               247 1.0 2.3981e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             247 1.0 1.8351e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
EPSSetUp               1 1.0 1.5565e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+01  0  0  0  0  1   0  0  0  0  1     0
EPSSolve               1 1.0 8.5090e+00 1.0 6.26e+09 1.1 5.2e+06 2.1e+04 2.4e+03  1100 99 99 99   1100 99 99 99 135365
STSetUp                1 1.0 1.3724e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
STComputeOperatr       1 1.0 7.1348e-05 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVCreate             245 1.0 6.2414e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 7.4e+02  0  0  0  0 30   0  0  0  0 30     0
BVCopy               488 1.0 1.9780e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMultVec           1210 1.0 6.7882e-01 1.1 1.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0 19  0  0  0   0 19  0  0  0 321786
BVMultInPlace        247 1.0 7.8465e-01 1.6 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0 45  0  0  0   0 45  0  0  0 663459
BVDot                718 1.0 1.7888e+00 2.0 5.64e+08 1.0 0.0e+00 0.0e+00 7.2e+02  0  9  0  0 29   0  9  0  0 29 60566
BVDotVec             487 1.0 5.3124e-01 1.2 2.85e+08 1.0 0.0e+00 0.0e+00 4.9e+02  0  5  0  0 20   0  5  0  0 20 102853
BVOrthogonalizeV     244 1.0 5.6093e-01 1.0 5.62e+08 1.0 0.0e+00 0.0e+00 2.5e+02  0  9  0  0 10   0  9  0  0 10 192477
BVScale              482 1.0 2.0062e-03 1.7 2.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 217721
BVSetRandom            6 1.0 1.3480e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMatProject         480 1.0 1.8300e+00 2.0 5.64e+08 1.0 0.0e+00 0.0e+00 7.2e+02  0  9  0  0 29   0  9  0  0 29 59203
DSSolve              242 1.0 2.3012e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors            482 1.0 4.5230e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther              485 1.0 2.4384e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 3.5111e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             479 1.0 3.3062e-01 2.2 3.89e+08 2.2 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 144006
PCSetUp                2 1.0 4.6721e-02 6.0 2.24e+07 4.9 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 38932
PCApply              479 1.0 3.7933e-01 2.4 4.11e+08 2.3 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 130309
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix  1216           1216     91546008     0.
              Vector   994            994     83668472     0.
           Index Set     5              5       733636     0.
   Star Forest Graph     1              1         1224     0.
          EPS Solver     1              1        13512     0.
  Spectral Transform     1              1          908     0.
       Basis Vectors   246            246       785872     0.
              Region     1              1          680     0.
       Direct Solver     1              1      3617024     0.
       Krylov Solver     2              2         3200     0.
      Preconditioner     2              2         1936     0.
         PetscRandom     1              1          670     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 5e-08
Average time for MPI_Barrier(): 1.90986e-05
Average time for zero size MPI_Send(): 3.44587e-06
#PETSc Option Table entries:
-eps_ncv 300
-eps_nev 3
-eps_smallest_real
-eps_tol 1e-10
-eps_type gd
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack=1 --with-blaslapack-dir=/public/software/compiler/intel/oneapi/mkl/2021.3.0 --with-64-bit-blas-indices=0 --with-boost=1 --with-boost-dir=/public/home/jrf/tools/boost_1_73_0/gcc7.3.1 --prefix=/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug --with-valgrind-dir=/public/home/jrf/tools/valgrind --LDFLAGS=-Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib --with-64-bit-indices=0 --with-petsc-arch=gcc7.3.1-32indices-nodebug --with-debugging=no
-----------------------------------------
Libraries compiled on 2022-06-14 01:43:59 on login05 
Machine characteristics: Linux-3.10.0-957.el7.x86_64-x86_64-with-centos
Using PETSc directory: /public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug
Using PETSc arch: 
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O   
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O     
-----------------------------------------

Using include paths: -I/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/include -I/public/home/jrf/tools/boost_1_73_0/gcc7.3.1/include -I/public/home/jrf/tools/valgrind/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -L/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -lpetsc -Wl,-rpath,/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -L/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -Wl,-rpath,/opt/hpc/software/mpi/hwloc/lib -L/opt/hpc/software/mpi/hwloc/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -L/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib64 -L/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib -L/opt/rh/devtoolset-7/root/usr/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------


More information about the petsc-users mailing list