[petsc-users] Very slow SVD with SLEPC
Jose E. Roman
jroman at dsic.upv.es
Wed Nov 18 13:46:49 CST 2020
As I said, most SLEPc solvers are not intended for computing the complete SVD. If you know how iterative methods work, it is absurd to request N singular values because what it does is to build a subspace of the same size as the original space, and the projected matrix is the same size as the original matrix. So essentially the orthogonalization effort is useless.
The example that you sent can be solved in about 15 seconds with the following settings:
-svd_type cross -svd_eps_type lapack -svd_cross_explicitmatrix
Regards,
Jose
PS. Please respond to the list and not to me only, so that other users can follow the discussion in the mailing list as well as the mailing list archive.
> El 18 nov 2020, a las 2:30, Rakesh Halder <rhalder at umich.edu> escribió:
>
> Hi Jose,
>
> I wrote a simple code to solve the SVD of a randomly generated, dense 150000x1000 matrix using the Lanczos method:
>
> #include "slepcsys.h"
> #include "slepcsvd.h"
> int main( int argc, char **argv )
> {
> int ierr;
> SlepcInitialize(&argc,&argv,(char*)0,help);
>
> Mat A;
> SVD svd;
> int M, N, nconv;
>
> M = 150000;
> N = 1000;
>
> MatCreateDense(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,M,N,NULL,&A);
> MatSetFromOptions(A);
> MatSetUp(A);
> MatSetRandom(A,NULL);
> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>
> SVDCreate(PETSC_COMM_WORLD,&svd);
> SVDSetOperator(svd,A);
> SVDSetFromOptions(svd);
> SVDSetDimensions(svd,N,PETSC_DEFAULT,PETSC_DEFAULT);
> SVDSetType(svd,SVDLANCZOS);
> SVDSolve(svd);
>
> SVDGetConverged(svd,&nconv);
> PetscPrintf(PETSC_COMM_WORLD," Number of converged approximate singular triplets: %D\n\n",nconv);
> ierr = SlepcFinalize();
> return ierr;
> }
>
> This isn't the original code I used in my previous emails, which was a research code. I used the -log_view option this time, and found that SVDSolve actually took around 1600 seconds:
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option. #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
> ./hello on a real-opt named rakesh-pc with 1 processor, by rakesh Tue Nov 17 17:32:40 2020
> Using Petsc Release Version 3.11.0, Mar, 29, 2019
>
> Max Max/Min Avg Total
> Time (sec): 1.601e+03 1.000 1.601e+03
> Objects: 4.400e+01 1.000 4.400e+01
> Flop: 3.669e+12 1.000 3.669e+12 3.669e+12
> Flop/sec: 2.292e+09 1.000 2.292e+09 2.292e+09
> Memory: 3.690e+09 1.000 3.690e+09 3.690e+09
> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flop
> and VecAXPY() for complex vectors of length N --> 8N flop
>
> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total Count %Total Avg %Total Count %Total
> 0: Main Stage: 1.6010e+03 100.0% 3.6689e+12 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flop: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> AvgLen: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flop in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option. #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 7738 1.0 9.9231e+02 1.0 2.32e+12 1.0 0.0e+00 0.0e+00 0.0e+00 62 63 0 0 0 62 63 0 0 0 2339
> MatAssemblyBegin 3 1.0 1.7299e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 3 1.0 1.1190e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatTranspose 1 1.0 2.6920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatSetRandom 1 1.0 1.0066e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> SVDSetUp 1 1.0 3.0697e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> SVDSolve 1 1.0 1.5905e+03 1.0 3.67e+12 1.0 0.0e+00 0.0e+00 0.0e+00 99100 0 0 0 99100 0 0 0 2307
> BVCopy 7 1.0 2.3330e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> BVMultVec 11616 1.0 2.5893e+02 1.0 6.05e+11 1.0 0.0e+00 0.0e+00 0.0e+00 16 16 0 0 0 16 16 0 0 0 2336
> BVMultInPlace 16 1.0 7.4126e+01 1.0 1.36e+11 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 1840
> BVDotVec 11609 1.0 2.5999e+02 1.0 6.06e+11 1.0 0.0e+00 0.0e+00 0.0e+00 16 17 0 0 0 16 17 0 0 0 2331
> BVOrthogonalizeV 7738 1.0 5.1901e+02 1.0 1.21e+12 1.0 0.0e+00 0.0e+00 0.0e+00 32 33 0 0 0 32 33 0 0 0 2334
> BVScale 7731 1.0 4.6120e-01 1.0 5.84e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1267
> BVNormVec 1 1.0 2.9133e-05 1.0 2.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 69
> BVSetRandom 1 1.0 8.7148e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> DSSolve 8 1.0 1.4014e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> DSOther 8 1.0 6.1296e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 4 1.0 3.7447e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSetRandom 1 1.0 7.1773e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 18 14 63013584 0.
> PetscRandom 4 2 1308 0.
> SVD Solver 1 0 0 0.
> EPS Solver 1 1 1416 0.
> Spectral Transform 1 1 800 0.
> Krylov Solver 1 1 1592 0.
> Preconditioner 1 1 864 0.
> Basis Vectors 3 1 1152 0.
> Direct Solver 2 1 1376 0.
> Region 1 1 680 0.
> Vector 10 0 0 0.
> Viewer 1 0 0 0.
> ========================================================================================================================
> Average time to get PetscTime(): 2.81027e-08
> #PETSc Option Table entries:
> -log_view
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --PETSC_ARCH=real-opt --with-scalar-type=real --with-debugging=yes --with-mpi-dir=/home/rakesh/packages/openmpi-1.10.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --download-fblaslapack=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11
> -----------------------------------------
> Libraries compiled on 2020-06-05 15:46:23 on rakesh-pc
> Machine characteristics: Linux-5.3.0-53-generic-x86_64-with-debian-buster-sid
> Using PETSc directory: /home/rakesh/packages/petsc-3.11.0
> Using PETSc arch: real-opt
> -----------------------------------------
>
> Using C compiler: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3
> Using Fortran compiler: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g
> -----------------------------------------
>
> Using include paths: -I/home/rakesh/packages/petsc-3.11.0/include -I/home/rakesh/packages/petsc-3.11.0/real-opt/include -I/home/rakesh/packages/openmpi-1.10.7/opt-gfortran/include
> -----------------------------------------
>
> Using C linker: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpicc
> Using Fortran linker: /home/rakesh/packages/openmpi-1.10.7/opt-gfortran/bin/mpif90
> Using libraries: -Wl,-rpath,/home/rakesh/packages/petsc-3.11.0/real-opt/lib -L/home/rakesh/packages/petsc-3.11.0/real-opt/lib -lpetsc -Wl,-rpath,/home/rakesh/packages/petsc-3.11.0/real-opt/lib -L/home/rakesh/packages/petsc-3.11.0/real-opt/lib -Wl,-rpath,/home/rakesh/packages/openmpi-1.10.7/opt-gfortran/lib -L/home/rakesh/packages/openmpi-1.10.7/opt-gfortran/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
> -----------------------------------------
>
>
>
> The program overall itself took around 1600 seconds as well, so I'm sure it's a problem with SVDSolve, I'm not sure why the log in the research code I was using was any different. I found that the solver took around this much time regardless of the solver (cross, lanczos, or scaLAPACK on a different machine with the newest versions of petsc and slepc). For the solvers that can calculate the partial SVDs (cross and lanczos) I found that not calling SVDSetDimensions will result in 1 singular triplet being calculated, and this takes around 5 seconds. I'm wondering if there are additional options I'm not setting correctly, or if see anything wrong with the code I sent. Thanks a lot for your help with this issue, also.
>
> Rakesh
More information about the petsc-users
mailing list