<div dir="ltr"><div dir="ltr">On Sat, Oct 17, 2020 at 5:21 AM Alexey Kozlov <<a href="mailto:Alexey.V.Kozlov.2@nd.edu">Alexey.V.Kozlov.2@nd.edu</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Matt,</div><div><br></div><div>Thank you for your reply!<br></div><div>My system has 8 NUMA nodes, so the memory bandwidth can increase up to 8 times when doing parallel computations. In other words, each node of the big computer cluster works as a small cluster consisting of 8 nodes.
Of course, this works only if the contribution of communications between the NUMA nodes is small. The total amount of memory on a single cluster node is 128GB, so it is enough to fit my application.</div></div></blockquote><div><br></div><div>Barry is right, of course. We can see that the PETSc LU, using the natural ordering, is doing 10,000x flops compared to MUMPS. Using the same ordering, MUMPS might</div><div>still benefit from blocking, but the gap would be much much smaller.</div><div><br></div><div>I misunderstood your description of the parallelism. Yes, using 8 nodes you could see 8x from one node. I think Pierre is correct that something related to the size is</div><div>happening since the numeric factorization in the parallel case for MUMPS is running at 30x the flop rate of the serial case. Its possible that they are using a different</div><div>ordering in parallel that does more flope, but is more amenable to vectorization. It is hard to know without reporting all the MUMPS options.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
</div><div>Below
is the output of -log_view for three cases: <br></div><div>(1) BUILT-IN PETSC LU SOLVER</div><div>---------------------------------------------- PETSc Performance Summary: ----------------------------------------------<br><br>./caat on a arch-linux-c-opt named <a href="http://d24cepyc110.crc.nd.edu" target="_blank">d24cepyc110.crc.nd.edu</a> with 1 processor, by akozlov Sat Oct 17 03:58:23 2020<br>Using 0 OpenMP threads<br>Using Petsc Release Version 3.13.6, unknown <br><br> Max Max/Min Avg Total<br>Time (sec): 5.551e+03 1.000 5.551e+03<br>Objects: 1.000e+01 1.000 1.000e+01<br>Flop: 1.255e+13 1.000 1.255e+13 1.255e+13<br>Flop/sec: 2.261e+09 1.000 2.261e+09 2.261e+09<br>MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00<br>MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00<br>MPI Reductions: 0.000e+00 0.000<br><br>Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)<br> e.g., VecAXPY() for real vectors of length N --> 2N flop<br> and VecAXPY() for complex vectors of length N --> 8N flop<br><br>Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --<br> Avg %Total Avg %Total Count %Total Avg %Total Count %Total<br> 0: Main Stage: 5.5509e+03 100.0% 1.2551e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br><br>------------------------------------------------------------------------------------------------------------------------<br>See the 'Profiling' chapter of the users' manual for details on interpreting output.<br>Phase summary info:<br> Count: number of times phase was executed<br> Time and Flop: Max - maximum over all processors<br> Ratio - ratio of maximum to minimum over all processors<br> Mess: number of messages sent<br> AvgLen: average message length (bytes)<br> Reduct: number of global reductions<br> Global: entire computation<br> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().<br> %T - percent time in this phase %F - percent flop in this phase<br> %M - percent messages in this phase %L - percent message lengths in this phase<br> %R - percent reductions in this phase<br> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)<br>------------------------------------------------------------------------------------------------------------------------<br>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total<br> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>------------------------------------------------------------------------------------------------------------------------<br><br>--- Event Stage 0: Main Stage<br><br>MatSolve 1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6246<br>MatLUFactorSym 1 1.0 1.0673e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatLUFactorNum 1 1.0 5.5350e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 2267<br>MatAssemblyBegin 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatAssemblyEnd 1 1.0 1.0247e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatGetRowIJ 1 1.0 1.4306e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatGetOrdering 1 1.0 1.2596e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecSet 4 1.0 9.3985e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecAssemblyBegin 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecAssemblyEnd 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>KSPSetUp 1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>KSPSolve 1 1.0 7.3284e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6245<br>PCSetUp 1 1.0 5.5458e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 2262<br>PCApply 1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6246<br>------------------------------------------------------------------------------------------------------------------------<br><br>Memory usage is given in bytes:<br><br>Object Type Creations Destructions Memory Descendants' Mem.<br>Reports information only for process 0.<br><br>--- Event Stage 0: Main Stage<br><br> Matrix 2 2 11501999992 0.<br> Vector 2 2 3761520 0.<br> Krylov Solver 1 1 1408 0.<br> Preconditioner 1 1 1184 0.<br> Index Set 3 3 1412088 0.<br> Viewer 1 0 0 0.<br>========================================================================================================================<br>Average time to get PetscTime(): 7.15256e-08<br>#PETSc Option Table entries:<br>-ksp_type preonly<br>-log_view<br>-pc_type lu<br>#End of PETSc Option Table entries<br>Compiled without FORTRAN kernels<br>Compiled with full precision matrices (default)<br>sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4<br>Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch<br>-----------------------------------------<br>Libraries compiled on 2020-10-14 10:52:17 on <a href="http://epycfe.crc.nd.edu" target="_blank">epycfe.crc.nd.edu</a> <br>Machine characteristics: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo<br>Using PETSc directory: /afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc</a><br>Using PETSc arch: arch-linux-c-opt<br>-----------------------------------------<br><br>Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp <br>Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp <br>-----------------------------------------<br><br>Using include paths: -I/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/include" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/include</a> -I/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include</a> -I/opt/crc/v/valgrind/3.14/ompi/include<br>-----------------------------------------<br><br>Using C linker: mpicc<br>Using Fortran linker: mpif90<br>Using libraries: -Wl,-rpath,/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -L/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -lpetsc -Wl,-rpath,/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -L/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 -L/opt/crc/i/intel/19.0/mkl/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/<a href="http://crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin" target="_blank">crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin</a> -L/afs/<a href="http://crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin" target="_blank">crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin</a> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl<br>-----------------------------------------<br><br><br></div><div>(2) EXTERNAL PACKAGE MUMPS, 1 MPI PROCESS</div><div>---------------------------------------------- PETSc Performance Summary: ----------------------------------------------<br><br>./caat on a arch-linux-c-opt named <a href="http://d24cepyc068.crc.nd.edu" target="_blank">d24cepyc068.crc.nd.edu</a> with 1 processor, by akozlov Sat Oct 17 01:55:20 2020<br>Using 0 OpenMP threads<br>Using Petsc Release Version 3.13.6, unknown <br><br> Max Max/Min Avg Total<br>Time (sec): 1.075e+02 1.000 1.075e+02<br>Objects: 9.000e+00 1.000 9.000e+00<br>Flop: 1.959e+12 1.000 1.959e+12 1.959e+12<br>Flop/sec: 1.823e+10 1.000 1.823e+10 1.823e+10<br>MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00<br>MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00<br>MPI Reductions: 0.000e+00 0.000<br><br>Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)<br> e.g., VecAXPY() for real vectors of length N --> 2N flop<br> and VecAXPY() for complex vectors of length N --> 8N flop<br><br>Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --<br> Avg %Total Avg %Total Count %Total Avg %Total Count %Total<br> 0: Main Stage: 1.0747e+02 100.0% 1.9594e+12 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br><br>------------------------------------------------------------------------------------------------------------------------<br>See the 'Profiling' chapter of the users' manual for details on interpreting output.<br>Phase summary info:<br> Count: number of times phase was executed<br> Time and Flop: Max - maximum over all processors<br> Ratio - ratio of maximum to minimum over all processors<br> Mess: number of messages sent<br> AvgLen: average message length (bytes)<br> Reduct: number of global reductions<br> Global: entire computation<br> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().<br> %T - percent time in this phase %F - percent flop in this phase<br> %M - percent messages in this phase %L - percent message lengths in this phase<br> %R - percent reductions in this phase<br> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)<br>------------------------------------------------------------------------------------------------------------------------<br>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total<br> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>------------------------------------------------------------------------------------------------------------------------<br><br>--- Event Stage 0: Main Stage<br><br>MatSolve 1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 6126201<br>MatLUFactorSym 1 1.0 2.3141e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0<br>MatLUFactorNum 1 1.0 1.0001e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93 0 0 0 0 93 0 0 0 0 12<br>MatAssemblyBegin 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatAssemblyEnd 1 1.0 1.0067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatGetRowIJ 1 1.0 1.8650e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatGetOrdering 1 1.0 1.3029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecCopy 1 1.0 1.0943e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecSet 4 1.0 9.2626e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecAssemblyBegin 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecAssemblyEnd 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>KSPSetUp 1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>KSPSolve 1 1.0 3.1981e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 6123146<br>PCSetUp 1 1.0 1.0251e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 0 0 0 0 95 0 0 0 0 11<br>PCApply 1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 6126096<br>------------------------------------------------------------------------------------------------------------------------<br><br>Memory usage is given in bytes:<br><br>Object Type Creations Destructions Memory Descendants' Mem.<br>Reports information only for process 0.<br><br>--- Event Stage 0: Main Stage<br><br> Matrix 2 2 59441612 0.<br> Vector 2 2 3761520 0.<br> Krylov Solver 1 1 1408 0.<br> Preconditioner 1 1 1184 0.<br> Index Set 2 2 941392 0.<br> Viewer 1 0 0 0.<br>========================================================================================================================<br>Average time to get PetscTime(): 4.76837e-08<br>#PETSc Option Table entries:<br>-ksp_type preonly<br>-log_view<br>-pc_factor_mat_solver_type mumps<br>-pc_type lu<br>#End of PETSc Option Table entries<br>Compiled without FORTRAN kernels<br>Compiled with full precision matrices (default)<br>sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4<br>Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch<br>-----------------------------------------<br>Libraries compiled on 2020-10-14 10:52:17 on <a href="http://epycfe.crc.nd.edu" target="_blank">epycfe.crc.nd.edu</a> <br>Machine characteristics: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo<br>Using PETSc directory: /afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc</a><br>Using PETSc arch: arch-linux-c-opt<br>-----------------------------------------<br><br>Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp <br>Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp <br>-----------------------------------------<br><br>Using include paths: -I/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/include" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/include</a> -I/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include</a> -I/opt/crc/v/valgrind/3.14/ompi/include<br>-----------------------------------------<br><br>Using C linker: mpicc<br>Using Fortran linker: mpif90<br>Using libraries: -Wl,-rpath,/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -L/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -lpetsc -Wl,-rpath,/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -L/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 -L/opt/crc/i/intel/19.0/mkl/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/<a href="http://crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin" target="_blank">crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin</a> -L/afs/<a href="http://crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin" target="_blank">crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin</a> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl<br>-----------------------------------------<br><br><br></div><div>(3)
EXTERNAL PACKAGE MUMPS
, 48 MPI PROCESSES ON A SINGLE CLUSTER NODE WITH 8 NUMA NODES</div><div>---------------------------------------------- PETSc Performance Summary: ----------------------------------------------<br><br>./caat on a arch-linux-c-opt named <a href="http://d24cepyc069.crc.nd.edu" target="_blank">d24cepyc069.crc.nd.edu</a> with 48 processors, by akozlov Sat Oct 17 04:40:25 2020<br>Using 0 OpenMP threads<br>Using Petsc Release Version 3.13.6, unknown <br><br> Max Max/Min Avg Total<br>Time (sec): 1.415e+01 1.000 1.415e+01<br>Objects: 3.000e+01 1.000 3.000e+01<br>Flop: 4.855e+10 1.637 4.084e+10 1.960e+12<br>Flop/sec: 3.431e+09 1.637 2.886e+09 1.385e+11<br>MPI Messages: 1.180e+02 2.682 8.169e+01 3.921e+03<br>MPI Message Lengths: 1.559e+05 5.589 1.238e+03 4.855e+06<br>MPI Reductions: 4.000e+01 1.000<br><br>Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)<br> e.g., VecAXPY() for real vectors of length N --> 2N flop<br> and VecAXPY() for complex vectors of length N --> 8N flop<br><br>Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --<br> Avg %Total Avg %Total Count %Total Avg %Total Count %Total<br> 0: Main Stage: 1.4150e+01 100.0% 1.9602e+12 100.0% 3.921e+03 100.0% 1.238e+03 100.0% 3.100e+01 77.5%<br><br>------------------------------------------------------------------------------------------------------------------------<br>See the 'Profiling' chapter of the users' manual for details on interpreting output.<br>Phase summary info:<br> Count: number of times phase was executed<br> Time and Flop: Max - maximum over all processors<br> Ratio - ratio of maximum to minimum over all processors<br> Mess: number of messages sent<br> AvgLen: average message length (bytes)<br> Reduct: number of global reductions<br> Global: entire computation<br> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().<br> %T - percent time in this phase %F - percent flop in this phase<br> %M - percent messages in this phase %L - percent message lengths in this phase<br> %R - percent reductions in this phase<br> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)<br>------------------------------------------------------------------------------------------------------------------------<br>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total<br> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>------------------------------------------------------------------------------------------------------------------------<br><br>--- Event Stage 0: Main Stage<br><br>BuildTwoSided 5 1.0 1.0707e-02 3.3 0.00e+00 0.0 7.8e+02 4.0e+00 5.0e+00 0 0 20 0 12 0 0 20 0 16 0<br>BuildTwoSidedF 3 1.0 8.6837e-03 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 8 0 0 0 0 10 0<br>MatSolve 1 1.0 6.6314e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 6.0e+00 0100 90 87 15 0100 90 87 19 29529617<br>MatLUFactorSym 1 1.0 2.4322e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 17 0 0 0 10 17 0 0 0 13 0<br>MatLUFactorNum 1 1.0 5.8816e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00 0.0e+00 42 0 0 0 0 42 0 0 0 0 332<br>MatAssemblyBegin 1 1.0 7.3917e-0357.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 2 0 0 0 0 3 0<br>MatAssemblyEnd 1 1.0 2.5823e-02 1.0 0.00e+00 0.0 3.8e+02 1.6e+03 5.0e+00 0 0 10 13 12 0 0 10 13 16 0<br>MatGetRowIJ 1 1.0 3.5763e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatGetOrdering 1 1.0 9.2506e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecSet 4 1.0 5.3000e-0460.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecAssemblyBegin 2 1.0 2.2390e-0319.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 5 0 0 0 0 6 0<br>VecAssemblyEnd 2 1.0 9.7752e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecScatterBegin 2 1.0 1.6036e-0312.8 0.00e+00 0.0 5.9e+02 4.8e+03 1.0e+00 0 0 15 58 2 0 0 15 58 3 0<br>VecScatterEnd 2 1.0 2.0087e-0338.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>SFSetGraph 2 1.0 1.5259e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>SFSetUp 3 1.0 3.3023e-03 2.9 0.00e+00 0.0 1.6e+03 7.0e+02 2.0e+00 0 0 40 23 5 0 0 40 23 6 0<br>SFBcastOpBegin 2 1.0 1.5953e-0313.7 0.00e+00 0.0 5.9e+02 4.8e+03 1.0e+00 0 0 15 58 2 0 0 15 58 3 0<br>SFBcastOpEnd 2 1.0 2.0008e-0345.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>SFPack 2 1.0 1.4646e-03361.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>SFUnpack 2 1.0 4.1723e-0529.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>KSPSetUp 1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>KSPSolve 1 1.0 6.6350e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 6.0e+00 0100 90 87 15 0100 90 87 19 29513594<br>PCSetUp 1 1.0 8.4679e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00 1.0e+01 60 0 0 0 25 60 0 0 0 32 230<br>PCApply 1 1.0 6.6319e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 6.0e+00 0100 90 87 15 0100 90 87 19 29527282<br>------------------------------------------------------------------------------------------------------------------------<br><br>Memory usage is given in bytes:<br><br>Object Type Creations Destructions Memory Descendants' Mem.<br>Reports information only for process 0.<br><br>--- Event Stage 0: Main Stage<br><br> Matrix 4 4 1224428 0.<br> Vec Scatter 3 3 2400 0.<br> Vector 8 8 1923424 0.<br> Index Set 9 9 32392 0.<br> Star Forest Graph 3 3 3376 0.<br> Krylov Solver 1 1 1408 0.<br> Preconditioner 1 1 1160 0.<br> Viewer 1 0 0 0.<br>========================================================================================================================<br>Average time to get PetscTime(): 7.15256e-08<br>Average time for MPI_Barrier(): 3.48091e-06<br>Average time for zero size MPI_Send(): 2.49843e-06<br>#PETSc Option Table entries:<br>-ksp_type preonly<br>-log_view<br>-pc_factor_mat_solver_type mumps<br>-pc_type lu<br>#End of PETSc Option Table entries<br>Compiled without FORTRAN kernels<br>Compiled with full precision matrices (default)<br>sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4<br>Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch<br>-----------------------------------------<br>Libraries compiled on 2020-10-14 10:52:17 on <a href="http://epycfe.crc.nd.edu" target="_blank">epycfe.crc.nd.edu</a> <br>Machine characteristics: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo<br>Using PETSc directory: /afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc</a><br>Using PETSc arch: arch-linux-c-opt<br>-----------------------------------------<br><br>Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp <br>Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp <br>-----------------------------------------<br><br>Using include paths: -I/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/include" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/include</a> -I/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include</a> -I/opt/crc/v/valgrind/3.14/ompi/include<br>-----------------------------------------<br><br>Using C linker: mpicc<br>Using Fortran linker: mpif90<br>Using libraries: -Wl,-rpath,/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -L/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -lpetsc -Wl,-rpath,/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -L/afs/<a href="http://crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib" target="_blank">crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib</a> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 -L/opt/crc/i/intel/19.0/mkl/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/<a href="http://crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin" target="_blank">crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin</a> -L/afs/<a href="http://crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin" target="_blank">crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin</a> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl<br>-----------------------------------------<br><br><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Oct 17, 2020 at 12:33 AM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov <<a href="mailto:Alexey.V.Kozlov.2@nd.edu" target="_blank">Alexey.V.Kozlov.2@nd.edu</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">
<p class="MsoNormal" style="margin:0in 0in 10pt;line-height:115%;font-size:11pt;font-family:Calibri,sans-serif">Thank you for your advice! My sparse matrix seems to be very
stiff so I have decided to concentrate on the direct solvers. I have very good
results with MUMPS. Due to a lack of time I haven’t got a good result with SuperLU_DIST
and haven’t compiled PETSc with Pastix yet but I have a feeling that MUMPS is
the best. I have run a sequential test case with built-in PETSc LU (-pc_type lu
-ksp_type preonly) and MUMPs (-pc_type lu -ksp_type preonly
-pc_factor_mat_solver_type mumps) with default settings and found that MUMPs was
about 50 times faster than the built-in LU and used about 3 times less RAM. Do
you have any idea why it could be?</p></div></blockquote><div>The numbers do not sound realistic, but of course we do not have your particular problem. In particular, the memory figure seems impossible. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin:0in 0in 10pt;line-height:115%;font-size:11pt;font-family:Calibri,sans-serif"> </p>
<p class="MsoNormal" style="margin:0in 0in 10pt;line-height:115%;font-size:11pt;font-family:Calibri,sans-serif">My test case has about 100,000 complex equations with about 3,000,000
non-zeros. PETSc was compiled with the following options: ./configure
--with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g
--with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex
--with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS='-mkl=parallel
-O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel
-O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel
-O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist
--download-mumps --download-scalapack --download-metis --download-cmake
--download-parmetis --download-ptscotch. </p>
<p class="MsoNormal" style="margin:0in 0in 10pt;line-height:115%;font-size:11pt;font-family:Calibri,sans-serif">Running MUPMS in parallel using MPI also gave me a significant
gain in performance (about 10 times on a single cluster node).</p></div></blockquote><div>Again, this does not appear to make sense. The performance should be limited by memory bandwidth, and a single cluster node will not usually have</div><div>10x the bandwidth of a CPU, although it might be possible with a very old CPU.<br></div><div><br></div><div>It would help to understand the performance if you would send the output of -log_view.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin:0in 0in 10pt;line-height:115%;font-size:11pt;font-family:Calibri,sans-serif"> </p>
<span style="font-size:11pt;line-height:115%;font-family:Calibri,sans-serif">Could you, please, advise me whether I can adjust
some options for the direct solvers to improve performance? Should I try MUMPS
in OpenMP mode?</span>
</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">As Jed said high frequency is hard. AMG, as-is, can be adapted (<a href="https://link.springer.com/article/10.1007/s00466-006-0047-8" target="_blank">https://link.springer.com/article/10.1007/s00466-006-0047-8</a>) with parameters.<div>AMG for convection: use richardson/sor and not chebyshev smoothers and in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).</div><div>Mark</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <<a href="mailto:Alexey.V.Kozlov.2@nd.edu" target="_blank">Alexey.V.Kozlov.2@nd.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks a lot! I'll check them out.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div> These are small enough that likely sparse direct solvers are the best use of your time and for general efficiency. <div><br></div><div> PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and Pastix. I recommend configuring PETSc for all three of them and then comparing them for problems of interest to you.</div><div><br></div><div> --download-superlu_dist --download-mumps --download-pastix --download-scalapack (used by MUMPS) --download-metis --download-parmetis --download-ptscotch </div><div><br></div><div> Barry</div><div><br><div><br><blockquote type="cite"><div>On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <<a href="mailto:Alexey.V.Kozlov.2@nd.edu" target="_blank">Alexey.V.Kozlov.2@nd.edu</a>> wrote:</div><br><div><div dir="ltr">Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box. There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods. Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack).<br>
<br>
<a href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html" rel="noreferrer" target="_blank">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html</a><br>
<br>
Is the matrix complex symmetric?<br>
<br>
Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory. I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs.<br>
<br>
Alexey Kozlov <<a href="mailto:Alexey.V.Kozlov.2@nd.edu" target="_blank">Alexey.V.Kozlov.2@nd.edu</a>> writes:<br>
<br>
> Dear all,<br>
><br>
> I am solving a convected wave equation in a frequency domain. This equation<br>
> is a 3D Helmholtz equation with added first-order derivatives and mixed<br>
> derivatives, and with complex coefficients. The discretized PDE results in<br>
> a sparse linear system (about 10^6 equations) which is solved in PETSc. I<br>
> am having difficulty with the code convergence at high frequency, skewed<br>
> grid, and high Mach number. I suspect it may be due to the preconditioner I<br>
> use. I am currently using the ILU preconditioner with the number of fill<br>
> levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art<br>
> has evolved and there are better preconditioners for Helmholtz-like<br>
> problems. Could you, please, advise me on a better preconditioner?<br>
><br>
> Thanks,<br>
> Alexey<br>
><br>
> -- <br>
> Alexey V. Kozlov<br>
><br>
> Research Scientist<br>
> Department of Aerospace and Mechanical Engineering<br>
> University of Notre Dame<br>
><br>
> 117 Hessert Center<br>
> Notre Dame, IN 46556-5684<br>
> Phone: (574) 631-4335<br>
> Fax: (574) 631-8355<br>
> Email: <a href="mailto:akozlov@nd.edu" target="_blank">akozlov@nd.edu</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span style="color:rgb(0,0,255)"><font size="4"><span style="font-family:"comic sans ms",sans-serif">Alexey V. Kozlov</span></font><br><br>Research Scientist<br>Department of Aerospace and Mechanical Engineering<br>University of Notre Dame<br><br>117 Hessert Center<br>Notre Dame, IN 46556-5684<br>Phone: (574) 631-4335<br>Fax: (574) 631-8355<br>Email: <a href="mailto:akozlov@nd.edu" target="_blank">akozlov@nd.edu</a></span><br></div></div></div></div></div></div>
</div></blockquote></div><br></div></div></blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span style="color:rgb(0,0,255)"><font size="4"><span style="font-family:"comic sans ms",sans-serif">Alexey V. Kozlov</span></font><br><br>Research Scientist<br>Department of Aerospace and Mechanical Engineering<br>University of Notre Dame<br><br>117 Hessert Center<br>Notre Dame, IN 46556-5684<br>Phone: (574) 631-4335<br>Fax: (574) 631-8355<br>Email: <a href="mailto:akozlov@nd.edu" target="_blank">akozlov@nd.edu</a></span><br></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span style="color:rgb(0,0,255)"><font size="4"><span style="font-family:"comic sans ms",sans-serif">Alexey V. Kozlov</span></font><br><br>Research Scientist<br>Department of Aerospace and Mechanical Engineering<br>University of Notre Dame<br><br>117 Hessert Center<br>Notre Dame, IN 46556-5684<br>Phone: (574) 631-4335<br>Fax: (574) 631-8355<br>Email: <a href="mailto:akozlov@nd.edu" target="_blank">akozlov@nd.edu</a></span><br></div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span style="color:rgb(0,0,255)"><font size="4"><span style="font-family:"comic sans ms",sans-serif">Alexey V. Kozlov</span></font><br><br>Research Scientist<br>Department of Aerospace and Mechanical Engineering<br>University of Notre Dame<br><br>117 Hessert Center<br>Notre Dame, IN 46556-5684<br>Phone: (574) 631-4335<br>Fax: (574) 631-8355<br>Email: <a href="mailto:akozlov@nd.edu" target="_blank">akozlov@nd.edu</a></span><br></div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>