[petsc-users] SLEPc EPSGD: too much time in single iteration
Runfeng Jin
jsfaraway at gmail.com
Wed Jun 15 01:58:32 CDT 2022
Sorry ,I miss the attachment.
Runfeng Jin
Runfeng Jin <jsfaraway at gmail.com> 于2022年6月15日周三 14:56写道:
> Hi! You are right! I try to use a SLEPc and PETSc version with nodebug,
> and the matrix B's solver time become 99s. But It is still a little higher
> than matrix A(8s). Same as mentioned before, attachment is log view of
> no-debug version:
> file 1: log of matrix A solver. This is a larger
> matrix(900,000*900,000) but solved quickly(8s);
> file 2: log of matix B solver. This is a smaller matrix(2,547*2,547)
> but solved much slower(99s).
>
> By comparing these two files, the strang phenomenon still exist:
> 1) Matrix A has more basis vectors(375) than B(189), but A spent less time
> on BVCreate(0.6s) than B(32s);
> 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)
> 3) In debug version, matrix B distribute much more unbalancedly storage
> among processors(memory max/min 4365) than A(memory max/min 1.113), but
> other metrics seems more balanced. And in no-debug version there is no
> memory information output.
>
> The significant difference I can tell is :1) B use preallocation; 2) A's
> matrix elements are calculated by CPU, while B's matrix elements are
> calculated by GPU and then transfered to CPU and solved by PETSc in CPU.
>
> Does this is a normal result? I mean, the matrix with less non-zero
> elements and less dimension can cost more epssolve time? Is this due to the
> structure of matrix? IF so, is there any ways to increase the solve speed?
>
> Or this is weired and should be fixed by some ways?
> Thank you!
>
> Runfeng Jin
>
>
> Jose E. Roman <jroman at dsic.upv.es> 于2022年6月12日周日 16:08写道:
>
>> Please always respond to the list.
>>
>> Pay attention to the warnings in the log:
>>
>> ##########################################################
>> # #
>> # WARNING!!! #
>> # #
>> # This code was compiled with a debugging option. #
>> # To get timing results run ./configure #
>> # using --with-debugging=no, the performance will #
>> # be generally two or three times faster. #
>> # #
>> ##########################################################
>>
>> With the debugging option the times are not trustworthy, so I suggest
>> repeating the analysis with an optimized build.
>>
>> Jose
>>
>>
>> > El 12 jun 2022, a las 5:41, Runfeng Jin <jsfaraway at gmail.com> escribió:
>> >
>> > Hello!
>> > I compare these two matrix solver's log view and find some strange
>> thing. Attachment files are the log view.:
>> > file 1: log of matrix A solver. This is a larger
>> matrix(900,000*900,000) but solved quickly(30s);
>> > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547
>> , a little different from the matrix B that is mentioned in initial email,
>> but solved much slower too. I use this for a quicker test) but solved much
>> slower(1244s).
>> >
>> > By comparing these two files, I find some thing:
>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
>> time on BVCreate(0.349s) than B(296s);
>> > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)
>> > 3) Matrix B distribute much more unbalancedly storage among
>> processors(memory max/min 4365) than A(memory max/min 1.113), but other
>> metrics seems more balanced.
>> >
>> > I don't do prealocation in A, and it is distributed across processors
>> by PETSc. For B , when preallocation I use PetscSplitOwnership to decide
>> which part belongs to local processor, and B is also distributed by PETSc
>> when compute matrix values.
>> >
>> > - Does this mean, for matrix B, too much nonzero elements are stored in
>> single process, and this is why it cost too much more time in solving the
>> matrix and find eigenvalues? If so, are there some better ways to
>> distribute the matrix among processors?
>> > - Or are there any else reasons for this difference in cost time?
>> >
>> > Hope to recieve your reply, thank you!
>> >
>> > Runfeng Jin
>> >
>> >
>> >
>> > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月11日周六 20:33写道:
>> > Hello!
>> > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time.
>> Is there anything else I can do? Attachment is log when use PETSC_DEFAULT
>> for eps_ncv.
>> >
>> > Thank you !
>> >
>> > Runfeng Jin
>> >
>> > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月10日周五 20:50写道:
>> > The value -eps_ncv 5000 is huge.
>> > Better let SLEPc use the default value.
>> >
>> > Jose
>> >
>> >
>> > > El 10 jun 2022, a las 14:24, Jin Runfeng <jsfaraway at gmail.com>
>> escribió:
>> > >
>> > > Hello!
>> > > I want to acquire the 3 smallest eigenvalue, and attachment is the
>> log view output. I can see epssolve really cost the major time. But I can
>> not see why it cost so much time. Can you see something from it?
>> > >
>> > > Thank you !
>> > >
>> > > Runfeng Jin
>> > >
>> > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <jroman at dsic.upv.es> wrote:
>> > > Convergence depends on distribution of eigenvalues you want to
>> compute. On the other hand, the cost also depends on the time it takes to
>> build the preconditioner. Use -log_view to see the cost of the different
>> steps of the computation.
>> > >
>> > > Jose
>> > >
>> > >
>> > > > El 3 jun 2022, a las 18:50, jsfaraway <jsfaraway at gmail.com>
>> escribió:
>> > > >
>> > > > hello!
>> > > >
>> > > > I am trying to use epsgd compute matrix's one smallest eigenvalue.
>> And I find a strang thing. There are two matrix A(900000*900000) and
>> B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B
>> use 22 iterations and 38885s! What could be the reason for this? Or what
>> can I do to find the reason?
>> > > >
>> > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ".
>> > > > And there is one difference I can tell is matrix B has many small
>> value, whose absolute value is less than 10-6. Could this be the reason?
>> > > >
>> > > > Thank you!
>> > > >
>> > > > Runfeng Jin
>> > > <log_view.txt>
>> >
>> > <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220615/897e90a8/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/public/home/jrf/works/ecMRCI-shaula/MRCI on a named g16r3n07 with 256 processors, by jrf Wed Jun 15 10:04:00 2022
Using Petsc Release Version 3.15.1, Jun 17, 2021
Max Max/Min Avg Total
Time (sec): 1.029e+02 1.001 1.028e+02
Objects: 2.011e+03 1.146 1.761e+03
Flop: 1.574e+06 2.099 1.104e+06 2.827e+08
Flop/sec: 1.531e+04 2.099 1.074e+04 2.748e+06
MPI Messages: 3.881e+04 7.920 1.865e+04 4.773e+06
MPI Message Lengths: 1.454e+06 6.190 3.542e+01 1.691e+08
MPI Reductions: 1.791e+03 1.001
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 1.0285e+02 100.0% 2.8266e+08 100.0% 4.773e+06 100.0% 3.542e+01 100.0% 1.769e+03 98.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 2 1.0 4.0572e-01 2.6 0.00e+00 0.0 3.7e+04 4.0e+00 2.0e+00 0 0 1 0 0 0 0 1 0 0 0
BuildTwoSidedF 1 1.0 2.0986e-01 2.6 0.00e+00 0.0 2.4e+04 1.1e+02 1.0e+00 0 0 1 2 0 0 0 1 2 0 0
MatMult 193 1.0 4.6531e+00 1.1 9.85e+05 4.3 4.7e+06 3.5e+01 1.0e+00 4 48 99 98 0 4 48 99 98 0 29
MatSolve 377 1.0 1.6183e-0288.8 7.16e+04 6.6 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 982
MatLUFactorNum 1 1.0 5.7322e-05 2.8 6.21e+0222.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2177
MatILUFactorSym 1 1.0 8.5668e-03793.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 2.1006e-01 2.6 0.00e+00 0.0 2.4e+04 1.1e+02 1.0e+00 0 0 1 2 0 0 0 1 2 0 0
MatAssemblyEnd 1 1.0 3.0272e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 7.0000e-07 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 5.3758e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 98 1.0 9.0806e-0371.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNorm 3 1.0 1.8633e-01 1.8 6.00e+01 1.1 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCopy 959 1.0 1.4909e-0275.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 387 1.0 9.8578e-04 8.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 3 1.0 1.6157e-023639.0 6.00e+01 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
VecScatterBegin 196 1.0 3.7129e-01 1.5 0.00e+00 0.0 4.7e+06 3.5e+01 4.0e+00 0 0 99 98 0 0 0 99 98 0 0
VecScatterEnd 196 1.0 4.6171e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
VecSetRandom 3 1.0 3.5271e-0515.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 634 1.0 1.7589e-0265.9 1.20e+04 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 174
VecReduceComm 444 1.0 2.4972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.4e+02 24 0 0 0 25 24 0 0 0 25 0
SFSetGraph 1 1.0 1.1170e-05 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 4 1.0 2.3891e-01 1.3 0.00e+00 0.0 4.9e+04 1.1e+01 1.0e+00 0 0 1 0 0 0 0 1 0 0 0
SFPack 196 1.0 1.2023e-0291.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 196 1.0 4.3491e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
EPSSetUp 1 1.0 9.6815e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+01 1 0 0 0 1 1 0 0 0 1 0
EPSSolve 1 1.0 9.9906e+01 1.0 1.56e+06 2.1 4.7e+06 3.5e+01 1.7e+03 97 99 98 97 97 97 99 98 97 99 3
STSetUp 1 1.0 2.8679e-0450.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
STComputeOperatr 1 1.0 2.0985e-04223.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BVCreate 194 1.0 3.2437e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.8e+02 31 0 0 0 33 31 0 0 0 33 0
BVCopy 386 1.0 1.7107e-02110.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BVMultVec 1090 1.0 1.8337e-0221.1 2.22e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 20 0 0 0 0 20 0 0 0 3080
BVMultInPlace 224 1.0 1.8273e-0218.4 1.06e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1481
BVDot 319 1.0 1.7687e+01 1.1 1.11e+05 1.1 0.0e+00 0.0e+00 3.2e+02 17 10 0 0 18 17 10 0 0 18 2
BVDotVec 392 1.0 2.2083e+01 1.0 6.32e+04 1.1 0.0e+00 0.0e+00 3.9e+02 21 6 0 0 22 21 6 0 0 22 1
BVOrthogonalizeV 190 1.0 1.1538e+01 1.0 1.15e+05 1.1 0.0e+00 0.0e+00 2.0e+02 11 10 0 0 11 11 10 0 0 12 3
BVScale 254 1.0 1.7301e-02125.1 2.54e+03 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37
BVSetRandom 3 1.0 3.6330e-0485.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BVMatProject 255 1.0 1.7707e+01 1.1 1.11e+05 1.1 0.0e+00 0.0e+00 3.2e+02 17 10 0 0 18 17 10 0 0 18 2
DSSolve 82 1.0 5.4953e-0215.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSVectors 380 1.0 9.7683e-0366.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSOther 179 1.0 1.7321e-0239.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 1.8680e-0574.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 377 1.0 1.8723e-0214.2 7.16e+04 6.6 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 849
PCSetUp 2 1.0 8.8937e-0353.4 6.21e+0222.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14
PCApply 377 1.0 3.0085e-0213.8 7.23e+04 6.6 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 532
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 745 745 2664464 0.
Vector 793 793 1538360 0.
Index Set 10 10 10792 0.
Star Forest Graph 4 4 5376 0.
EPS Solver 1 1 3468 0.
Spectral Transform 1 1 908 0.
Basis Vectors 195 195 437744 0.
Region 1 1 680 0.
Direct Solver 1 1 20156 0.
Krylov Solver 2 2 3200 0.
Preconditioner 2 2 1936 0.
PetscRandom 1 1 670 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 4.7e-08
Average time for MPI_Barrier(): 0.0578456
Average time for zero size MPI_Send(): 0.00358668
#PETSc Option Table entries:
-eps_gd_blocksize 3
-eps_gd_initial_size 3
-eps_ncv PETSC_DEFAULT
-eps_type gd
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack=1 --with-blaslapack-dir=/public/software/compiler/intel/oneapi/mkl/2021.3.0 --with-64-bit-blas-indices=0 --with-boost=1 --with-boost-dir=/public/home/jrf/tools/boost_1_73_0/gcc7.3.1 --prefix=/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug --with-valgrind-dir=/public/home/jrf/tools/valgrind --LDFLAGS=-Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib --with-64-bit-indices=0 --with-petsc-arch=gcc7.3.1-32indices-nodebug --with-debugging=no
-----------------------------------------
Libraries compiled on 2022-06-14 01:43:59 on login05
Machine characteristics: Linux-3.10.0-957.el7.x86_64-x86_64-with-centos
Using PETSc directory: /public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug
Using PETSc arch:
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O
-----------------------------------------
Using include paths: -I/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/include -I/public/home/jrf/tools/boost_1_73_0/gcc7.3.1/include -I/public/home/jrf/tools/valgrind/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -L/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -lpetsc -Wl,-rpath,/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -L/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -Wl,-rpath,/opt/hpc/software/mpi/hwloc/lib -L/opt/hpc/software/mpi/hwloc/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -L/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib64 -L/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib -L/opt/rh/devtoolset-7/root/usr/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/public/home/jrf/works/qubic/bin/pfci.x on a named h09r4n13 with 192 processors, by jrf Wed Jun 15 12:10:57 2022
Using Petsc Release Version 3.15.1, Jun 17, 2021
Max Max/Min Avg Total
Time (sec): 9.703e+02 1.000 9.703e+02
Objects: 2.472e+03 1.000 2.472e+03
Flop: 6.278e+09 1.064 6.012e+09 1.154e+12
Flop/sec: 6.470e+06 1.064 6.196e+06 1.190e+09
MPI Messages: 3.635e+04 1.947 2.755e+04 5.290e+06
MPI Message Lengths: 7.246e+08 1.742 2.052e+04 1.085e+11
MPI Reductions: 2.464e+03 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 9.7032e+02 100.0% 1.1543e+12 100.0% 5.290e+06 100.0% 2.052e+04 100.0% 2.446e+03 99.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 2 1.0 1.9883e+029876.1 0.00e+00 0.0 2.1e+04 4.0e+00 2.0e+00 11 0 0 0 0 11 0 0 0 0 0
BuildTwoSidedF 1 1.0 1.9879e+021349804.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 11 0 0 0 0 11 0 0 0 0 0
MatMult 247 1.0 2.5963e+00 1.6 1.16e+09 1.2 5.3e+06 2.1e+04 1.0e+00 0 17100100 0 0 17100100 0 77449
MatSolve 479 1.0 3.2541e-01 2.3 3.89e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 146312
MatLUFactorNum 1 1.0 4.3923e-02 7.0 2.24e+07 4.9 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 41413
MatILUFactorSym 1 1.0 2.5215e-03 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 1.9879e+02654719.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 11 0 0 0 0 11 0 0 0 0 0
MatAssemblyEnd 1 1.0 2.1247e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 8.3000e-07 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.0741e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 244 1.0 2.5375e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNorm 3 1.0 1.3600e-0125.4 2.83e+04 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 40
VecCopy 1214 1.0 6.8012e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 486 1.0 2.4261e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 3 1.0 1.5987e-04 3.8 2.83e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 34009
VecScatterBegin 247 1.0 3.8039e-01 2.2 0.00e+00 0.0 5.3e+06 2.1e+04 1.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 247 1.0 1.3181e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 6 1.0 1.3014e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 723 1.0 5.9514e-03 2.1 6.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 220153
VecReduceComm 482 1.0 2.1629e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+02 0 0 0 0 20 0 0 0 0 20 0
SFSetGraph 1 1.0 1.3207e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 1 1.0 8.3540e-02 1.4 0.00e+00 0.0 4.2e+04 5.2e+03 1.0e+00 0 0 1 0 0 0 0 1 0 0 0
SFPack 247 1.0 2.3981e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 247 1.0 1.8351e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
EPSSetUp 1 1.0 1.5565e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+01 0 0 0 0 1 0 0 0 0 1 0
EPSSolve 1 1.0 8.5090e+00 1.0 6.26e+09 1.1 5.2e+06 2.1e+04 2.4e+03 1100 99 99 99 1100 99 99 99 135365
STSetUp 1 1.0 1.3724e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
STComputeOperatr 1 1.0 7.1348e-05 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BVCreate 245 1.0 6.2414e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 7.4e+02 0 0 0 0 30 0 0 0 0 30 0
BVCopy 488 1.0 1.9780e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BVMultVec 1210 1.0 6.7882e-01 1.1 1.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 19 0 0 0 0 19 0 0 0 321786
BVMultInPlace 247 1.0 7.8465e-01 1.6 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 45 0 0 0 0 45 0 0 0 663459
BVDot 718 1.0 1.7888e+00 2.0 5.64e+08 1.0 0.0e+00 0.0e+00 7.2e+02 0 9 0 0 29 0 9 0 0 29 60566
BVDotVec 487 1.0 5.3124e-01 1.2 2.85e+08 1.0 0.0e+00 0.0e+00 4.9e+02 0 5 0 0 20 0 5 0 0 20 102853
BVOrthogonalizeV 244 1.0 5.6093e-01 1.0 5.62e+08 1.0 0.0e+00 0.0e+00 2.5e+02 0 9 0 0 10 0 9 0 0 10 192477
BVScale 482 1.0 2.0062e-03 1.7 2.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 217721
BVSetRandom 6 1.0 1.3480e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BVMatProject 480 1.0 1.8300e+00 2.0 5.64e+08 1.0 0.0e+00 0.0e+00 7.2e+02 0 9 0 0 29 0 9 0 0 29 59203
DSSolve 242 1.0 2.3012e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSVectors 482 1.0 4.5230e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DSOther 485 1.0 2.4384e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 3.5111e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 479 1.0 3.3062e-01 2.2 3.89e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 144006
PCSetUp 2 1.0 4.6721e-02 6.0 2.24e+07 4.9 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 38932
PCApply 479 1.0 3.7933e-01 2.4 4.11e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 130309
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 1216 1216 91546008 0.
Vector 994 994 83668472 0.
Index Set 5 5 733636 0.
Star Forest Graph 1 1 1224 0.
EPS Solver 1 1 13512 0.
Spectral Transform 1 1 908 0.
Basis Vectors 246 246 785872 0.
Region 1 1 680 0.
Direct Solver 1 1 3617024 0.
Krylov Solver 2 2 3200 0.
Preconditioner 2 2 1936 0.
PetscRandom 1 1 670 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 5e-08
Average time for MPI_Barrier(): 1.90986e-05
Average time for zero size MPI_Send(): 3.44587e-06
#PETSc Option Table entries:
-eps_ncv 300
-eps_nev 3
-eps_smallest_real
-eps_tol 1e-10
-eps_type gd
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack=1 --with-blaslapack-dir=/public/software/compiler/intel/oneapi/mkl/2021.3.0 --with-64-bit-blas-indices=0 --with-boost=1 --with-boost-dir=/public/home/jrf/tools/boost_1_73_0/gcc7.3.1 --prefix=/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug --with-valgrind-dir=/public/home/jrf/tools/valgrind --LDFLAGS=-Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib --with-64-bit-indices=0 --with-petsc-arch=gcc7.3.1-32indices-nodebug --with-debugging=no
-----------------------------------------
Libraries compiled on 2022-06-14 01:43:59 on login05
Machine characteristics: Linux-3.10.0-957.el7.x86_64-x86_64-with-centos
Using PETSc directory: /public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug
Using PETSc arch:
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O
-----------------------------------------
Using include paths: -I/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/include -I/public/home/jrf/tools/boost_1_73_0/gcc7.3.1/include -I/public/home/jrf/tools/valgrind/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -L/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -lpetsc -Wl,-rpath,/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -L/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -Wl,-rpath,/opt/hpc/software/mpi/hwloc/lib -L/opt/hpc/software/mpi/hwloc/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -L/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib64 -L/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib -L/opt/rh/devtoolset-7/root/usr/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------
More information about the petsc-users
mailing list