[petsc-users] Very slow SVD with SLEPC

Wed Nov 18 13:46:49 CST 2020

As I said, most SLEPc solvers are not intended for computing the complete SVD. If you know how iterative methods work, it is absurd to request N singular values because what it does is to build a subspace of the same size as the original space, and the projected matrix is the same size as the original matrix. So essentially the orthogonalization effort is useless.

The example that you sent can be solved in about 15 seconds with the following settings:
-svd_type cross -svd_eps_type lapack -svd_cross_explicitmatrix

Regards,
Jose

PS. Please respond to the list and not to me only, so that other users can follow the discussion in the mailing list as well as the mailing list archive.

> El 18 nov 2020, a las 2:30, Rakesh Halder <rhalder at umich.edu> escribió:
> 
> Hi Jose,
> 
> I wrote a simple code to solve the SVD of a randomly generated, dense 150000x1000 matrix using the Lanczos method:
> 
> #include "slepcsys.h"
> #include "slepcsvd.h"
> int main( int argc, char **argv )
> {
>   int ierr;
>   SlepcInitialize(&argc,&argv,(char*)0,help);
> 
>   Mat A;
>   SVD svd;
>   int M, N, nconv;
> 
>   M = 150000;
>   N = 1000;
> 
>   MatCreateDense(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,M,N,NULL,&A);
>   MatSetFromOptions(A);
>   MatSetUp(A);
>   MatSetRandom(A,NULL);
>   MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>   MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
> 
>   SVDCreate(PETSC_COMM_WORLD,&svd);
>   SVDSetOperator(svd,A);
>   SVDSetFromOptions(svd);
>   SVDSetDimensions(svd,N,PETSC_DEFAULT,PETSC_DEFAULT);
>   SVDSetType(svd,SVDLANCZOS);
>   SVDSolve(svd);
> 
>   SVDGetConverged(svd,&nconv);
>   PetscPrintf(PETSC_COMM_WORLD," Number of converged approximate singular triplets: %D\n\n",nconv);
>   ierr = SlepcFinalize();
>   return ierr;
> }
> 
> This isn't the original code I used in my previous emails, which was a research code. I used the -log_view option this time, and found that SVDSolve actually took around 1600 seconds:
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> 
> 
> 
>       ##########################################################
>       #                                                        #
>       #                       WARNING!!!                       #
>       #                                                        #
>       #   This code was compiled with a debugging option.      #
>       #   To get timing results run ./configure                #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
> 
> 
> ./hello on a real-opt named rakesh-pc with 1 processor, by rakesh Tue Nov 17 17:32:40 2020
> Using Petsc Release Version 3.11.0, Mar, 29, 2019 
> 
>                          Max       Max/Min     Avg       Total 
> Time (sec):           1.601e+03     1.000   1.601e+03
> Objects:              4.400e+01     1.000   4.400e+01
> Flop:                 3.669e+12     1.000   3.669e+12  3.669e+12
> Flop/sec:             2.292e+09     1.000   2.292e+09  2.292e+09
> Memory:               3.690e+09     1.000   3.690e+09  3.690e+09
> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00     0.000
> 
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flop
>                             and VecAXPY() for complex vectors of length N --> 8N flop
> 
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total 
>  0:      Main Stage: 1.6010e+03 100.0%  3.6689e+12 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
> 
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> 
> 
>       ##########################################################
>       #                                                        #
>       #                       WARNING!!!                       #
>       #                                                        #
>       #   This code was compiled with a debugging option.      #
>       #   To get timing results run ./configure                #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
> 
> 
> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> MatMult             7738 1.0 9.9231e+02 1.0 2.32e+12 1.0 0.0e+00 0.0e+00 0.0e+00 62 63  0  0  0  62 63  0  0  0  2339
> MatAssemblyBegin       3 1.0 1.7299e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         3 1.0 1.1190e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatTranspose           1 1.0 2.6920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatSetRandom           1 1.0 1.0066e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> SVDSetUp               1 1.0 3.0697e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SVDSolve               1 1.0 1.5905e+03 1.0 3.67e+12 1.0 0.0e+00 0.0e+00 0.0e+00 99100  0  0  0  99100  0  0  0  2307
> BVCopy                 7 1.0 2.3330e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> BVMultVec          11616 1.0 2.5893e+02 1.0 6.05e+11 1.0 0.0e+00 0.0e+00 0.0e+00 16 16  0  0  0  16 16  0  0  0  2336
> BVMultInPlace         16 1.0 7.4126e+01 1.0 1.36e+11 1.0 0.0e+00 0.0e+00 0.0e+00  5  4  0  0  0   5  4  0  0  0  1840
> BVDotVec           11609 1.0 2.5999e+02 1.0 6.06e+11 1.0 0.0e+00 0.0e+00 0.0e+00 16 17  0  0  0  16 17  0  0  0  2331
> BVOrthogonalizeV    7738 1.0 5.1901e+02 1.0 1.21e+12 1.0 0.0e+00 0.0e+00 0.0e+00 32 33  0  0  0  32 33  0  0  0  2334
> BVScale             7731 1.0 4.6120e-01 1.0 5.84e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1267
> BVNormVec              1 1.0 2.9133e-05 1.0 2.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    69
> BVSetRandom            1 1.0 8.7148e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> DSSolve                8 1.0 1.4014e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> DSOther                8 1.0 6.1296e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 4 1.0 3.7447e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSetRandom           1 1.0 7.1773e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>               Matrix    18             14     63013584     0.
>          PetscRandom     4              2         1308     0.
>           SVD Solver     1              0            0     0.
>           EPS Solver     1              1         1416     0.
>   Spectral Transform     1              1          800     0.
>        Krylov Solver     1              1         1592     0.
>       Preconditioner     1              1          864     0.
>        Basis Vectors     3              1         1152     0.
>        Direct Solver     2              1         1376     0.
>               Region     1              1          680     0.
>               Vector    10              0            0     0.
>               Viewer     1              0            0     0.
> ========================================================================================================================
> Average time to get PetscTime(): 2.81027e-08
> #PETSc Option Table entries:
> -log_view
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --PETSC_ARCH=real-opt --with-scalar-type=real --with-debugging=yes --with-mpi-dir=/home/rakesh/packages/openmpi-1.10.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --download-fblaslapack=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11
> -----------------------------------------
> Libraries compiled on 2020-06-05 15:46:23 on rakesh-pc 
> Machine characteristics: Linux-5.3.0-53-generic-x86_64-with-debian-buster-sid
> Using PETSc directory: /home/rakesh/packages/petsc-3.11.0
> Using PETSc arch: real-opt
> -----------------------------------------
> 
> Using C compiler: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3  
> Using Fortran compiler: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g    
> -----------------------------------------
> 
> Using include paths: -I/home/rakesh/packages/petsc-3.11.0/include -I/home/rakesh/packages/petsc-3.11.0/real-opt/include -I/home/rakesh/packages/openmpi-1.10.7/opt-gfortran/include
> -----------------------------------------
> 
> Using C linker: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpicc
> Using Fortran linker: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpif90
> Using libraries: -Wl,-rpath,/home/rakesh/packages/petsc-3.11.0/real-opt/lib -L/home/rakesh/packages/petsc-3.11.0/real-opt/lib -lpetsc -Wl,-rpath,/home/rakesh/packages/petsc-3.11.0/real-opt/lib -L/home/rakesh/packages/petsc-3.11.0/real-opt/lib -Wl,-rpath,/home/rakesh/packages/openmpi-1.10.7/opt-gfortran/lib -L/home/rakesh/packages/openmpi-1.10.7/opt-gfortran/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
> -----------------------------------------
> 
> 
> 
> The program overall itself took around 1600 seconds as well, so I'm sure it's a problem with SVDSolve, I'm not sure why the log in the research code I was using was any different. I found that the solver took around this much time regardless of the solver (cross, lanczos, or scaLAPACK on a different machine with the newest versions of petsc and slepc). For the solvers that can calculate the partial SVDs (cross and lanczos) I found that not calling SVDSetDimensions will result in 1 singular triplet being calculated, and this takes around 5 seconds. I'm wondering if there are additional options I'm not setting correctly, or if see anything wrong with the code I sent. Thanks a lot for your help with this issue, also.
> 
> Rakesh