[petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert

Junchao Zhang junchao.zhang at gmail.com
Thu Aug 26 10:29:44 CDT 2021


Hello, Dan,
 You might want to have a look the manual at
https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
  Thanks.
--Junchao Zhang


On Thu, Aug 26, 2021 at 7:32 AM dazza simplythebest <sayosale at hotmail.com>
wrote:

> Dear Jose and Matthew,
>                  Many thanks for your assistance, this would seem to
> explain what the problem was.
> So judging by this test case, there seems to be a memory vs computational
> time tradeoff involved
>  in choosing whether to shift-invert or not; the shift-invert will greatly
> reduce the
> number of required iterations ,but will require a higher memory cost ?
> I have been trying a few values of -st_mat_mumps_icntl_14 (and also the
> alternative
> -st_mat_mumps_icntl_23) today but have not yet been able to select one
> that fits onto the
>  workstation I am using (although it seems that setting these parameters
> seems to guarantee
>  that an error message is generated at least).
>
> Thus I will probably need to reduce the number of MPI
> processes and thereby reduce the memory requirement). In this regard the
> MUMPS documentation
>  suggests that a hybrid MPI-OpenMP approach is optimum for their software,
> whereas I remember reading
> somewhere else that openmp threading was not a good choice for using
> PETSC, would you have any
> general advice on this ? I was thinking maybe that a version of slepc /
> petsc compiled against openmp,
>  and with the number of threads set appropriately, but not explicitly
> using openmp directives in
>  the user's code may be the way forward ? That way PETSC will (?) just
> ignore the threading whereas
>  threading will be available to MUMPS when execution is passed to those
> routines ?
>
>  Many thanks once again,
>              Dan.
>
>
>
> ------------------------------
> *From:* Jose E. Roman <jroman at dsic.upv.es>
> *Sent:* Wednesday, August 25, 2021 1:40 PM
> *To:* dazza simplythebest <sayosale at hotmail.com>
> *Cc:* PETSc <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Improving efficiency of slepc usage
>
> MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9
> is insuficient workspace. Try running with
>  -st_mat_mumps_icntl_14 <percentage>
> where <percentage> is the percentage in which you want to increase the
> workspace, e.g. 50 or 100 or more.
>
> See ex43.c for an example showing how to set this option in code.
>
> Jose
>
>
> > El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com>
> escribió:
> >
> >
> >
> > From: dazza simplythebest <sayosale at hotmail.com>
> > Sent: Wednesday, August 25, 2021 12:08 PM
> > To: Matthew Knepley <knepley at gmail.com>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > ​Dear Matthew and Jose,
> >                                           I have derived a smaller
> program from the original program by constructing
> > matrices of the same size, but filling their entries randomly instead of
> computing the correct
> > fluid dynamics values just to allow  faster experimentation. This
> modified code's behaviour seems
> >  to be similar, with the code again failing for the large matrix case
> with  the SIGKILL error, so I first report
> > results from that code here. Firstly I can confirm that I am using
> Fortran , and I am compiling with the
> >  intel compiler, which it seems places automatic arrays on the stack.
> The stacksize, as determined
> > by ulimit -a, is reported to be :
> > stack size              (kbytes, -s) 8192
> >
> > [1] Okay, so I followed your suggestion and used ctrl-c  followed by
> 'where' in one of the non-SIGKILL gdb windows.
> >  I have pasted the output into the bottom of this email (see [1] output)
> - it does look like the problem occurs somewhere in the call
> >  to the MUMPS solver ?
> >
> > [2] I have also today gained access to another workstation, and so have
> tried running the (original) code on that machine.
> >   This new machine has two (more powerful) CPU nodes and a larger memory
> (both machines feature Intel Xeon processors).
> > On this new machine the large matrix case again failed with the familiar
> SIGKILL report when I used 16 or 12 MPI
> > processes,  ran to the end w/out error for 4 or 6 MPI processes, and
> failed but with a PETSC error message
> >  when I used 8 MPI processes, which I have pasted below (see [2]
> output). Does this point to some sort of resource
> > demand that exceeds some limit as the number of MPI processes increases ?
> >
> >   Many thanks once again,
> >             Dan.
> >
> > [2] output
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: Error in external library
> > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [0]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [1]PETSC ERROR: Error in external library
> > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [1]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [1]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [1]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [1]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [1]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [2]PETSC ERROR: Error in external library
> > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [2]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [2]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [2]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [2]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [2]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [3]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [3]PETSC ERROR: Error in external library
> > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [3]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [3]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [3]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [3]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [3]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [3]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [3]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [3]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [4]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [4]PETSC ERROR: Error in external library
> > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [4]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [4]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [4]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [4]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [4]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [4]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [4]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [4]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [4]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [5]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [5]PETSC ERROR: Error in external library
> > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [5]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [5]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [5]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [5]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [5]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [5]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [5]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [5]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [5]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [6]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [6]PETSC ERROR: Error in external library
> > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=21891045
> >
> > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [6]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [6]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [6]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [6]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [6]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [6]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [6]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [6]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [6]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [7]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [7]PETSC ERROR: Error in external library
> > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=21841925
> >
> > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [7]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [7]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [7]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [7]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [7]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [7]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [7]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [7]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [7]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [0]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [0]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [0]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [0]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [0]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [0]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [0]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [0]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [1]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [1]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [1]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [1]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [2]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [2]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [2]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [2]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [3]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> >
> >
> >
> > [1] output
> >
> > Continuing.
> > [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> > [New Thread 0x7f6f5aad0800 (LWP 794040)]
> > [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> > ^C
> > Thread 1 "my.exe" received signal SIGINT, Interrupt.
> > 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > (gdb) where
> > #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > #1  0x00007f729049354b in ofi_cq_readfrom ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
> >     comm=1, flag=0x0, status=0xffffffffffffffff)
> >     at /usr/include/rdma/fi_tagged.h:109
> > #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
> >     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
> >     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> > #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
> >     blocking=<error reading variable: Cannot access memory at address
> 0x1>,
> >
> >     --Type <RET> for more, q to quit, c to continue without paging--cont
> >     irecv=<error reading variable: Cannot access memory at address 0x0>,
> message_received=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=...,
> lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1,
> iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816,
> lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796,
> ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=...,
> pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958,
> nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4,
> root=<error reading variable: value of type `zmumps_root_struc' requires
> 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0,
> itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=...,
> intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=...,
> frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=...,
> istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295,
> lrgroups=...) at zfac_process_message.F:730
> > #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=...,
> liw=<error reading variable: Cannot access memory at address 0x1>, a=...,
> la=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=...,
> step=..., frere=..., dad=..., cand=..., istep_to_iniv2=...,
> tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable:
> Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot
> access memory at address 0x0>, nelva=50400, comp=259581,
> maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable:
> Cannot access memory at address 0x2>, noffnegpv=<error reading variable:
> Cannot access memory at address 0x0>, nb22t1=<error reading variable:
> Cannot access memory at address 0x0>, nb22t2=<error reading variable:
> Cannot access memory at address 0x0>, nbtiny=<error reading variable:
> Cannot access memory at address 0x0>, det_exp=<error reading variable:
> Cannot access memory at address 0x0>, det_mant=<error reading variable:
> Cannot access memory at address 0x0>, det_sign=<error reading variable:
> Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=...,
> pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=...,
> lpool=<error reading variable: Cannot access memory at address 0x0>,
> rinfo=<error reading variable: Cannot access memory at address 0x0>,
> posfac=<error reading variable: Cannot access memory at address 0x0>,
> iwpos=<error reading variable: Cannot access memory at address 0x0>,
> lrlu=<error reading variable: Cannot access memory at address 0x0>,
> iptrlu=<error reading variable: Cannot access memory at address 0x0>,
> lrlus=<error reading variable: Cannot access memory at address 0x0>,
> leaf=<error reading variable: Cannot access memory at address 0x0>,
> nbroot=<error reading variable: Cannot access memory at address 0x0>,
> nbrtot=<error reading variable: Cannot access memory at address 0x0>,
> uu=<error reading variable: Cannot access memory at address 0x0>,
> icntl=<error reading variable: Cannot access memory at address 0x0>,
> ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory
> at address 0x0>, keep=<error reading variable: Cannot access memory at
> address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access
> memory at address 0x0>, procnode_steps=..., slavef=<error reading variable:
> Cannot access memory at address 0x4ffffffff>, myid=<error reading variable:
> Cannot access memory at address 0xffffffff>, comm_nodes=<error reading
> variable: Cannot access memory at address 0x0>, myid_nodes=<error reading
> variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0,
> lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0,
> frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30,
> seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314,
> mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0,
> lrgroups=...) at zfac_par_m.F:182
> > #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=...,
> la=<error reading variable: Cannot access memory at address 0x1>,
> liw=<error reading variable: Cannot access memory at address 0x0>,
> sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=...,
> frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=...,
> ptrar=..., ldptrar=<error reading variable: Cannot access memory at address
> 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=...,
> rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255,
> icntl=<error reading variable: Cannot access memory at address 0x25344>,
> info=..., rinfo=..., keep=..., keep8=..., procnode_steps=...,
> slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable:
> Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=...,
> lbufr=<error reading variable: Cannot access memory at address 0x11db4c>,
> lbufr_bytes=<error reading variable: Cannot access memory at address
> 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at
> address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot
> access memory at address 0x11dbec>, nelt=<error reading variable: Cannot
> access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error
> reading variable: Cannot access memory at address 0x0>, ass_irecv=<error
> reading variable: Cannot access memory at address 0x0>, seuil=<error
> reading variable: Cannot access memory at address 0x0>,
> seuil_ldlt_niv2=<error reading variable: Cannot access memory at address
> 0x0>, mem_distrib=<error reading variable: Cannot access memory at address
> 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>,
> pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at
> address 0x0>, lrgroups=...) at zfac_b.F:243
> > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable:
> value of type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zfac_driver.F:2421
> > #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of
> type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zmumps_driver.F:1883
> > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading
> variable: Cannot access memory at address 0x1>, comm_f77=<error reading
> variable: Cannot access memory at address 0x0>, n=<error reading variable:
> Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=...,
> cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0,
> jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739,
> irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=...,
> a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0,
> a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0,
> perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0,
> info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0,
> size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=...,
> schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0,
> rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0,
> rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=...,
> rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=...,
> irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0,
> nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0,
> schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=...,
> ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=...,
> tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20,
> save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248,
> A=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248,
> mat=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248,
> __ierr=0x7ffda7afdae0) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=...,
> b_pet=..., jthisone=<error reading variable: Cannot access memory at
> address 0x1>, isize=<error reading variable: Cannot access memory at
> address 0x0>) at small_slepc_example_program.F:322
> > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> > #26 0x00000000004023f2 in main ()
> > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>,
> argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>,
> rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at
> ../csu/libc-start.c:308
> > #28 0x00000000004022fe in _start ()
> >
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Tuesday, August 24, 2021 3:59 PM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <
> sayosale at hotmail.com> wrote:
> >
> > Dear Matthew and Jose,
> >    Apologies for the delayed reply, I had a couple of unforeseen days
> off this week.
> > Firstly regarding Jose's suggestion re: MUMPS, the program is already
> using MUMPS
> > to solve linear systems (the code is using a distributed MPI  matrix to
> solve the generalised
> > non-Hermitian complex problem).
> >
> > I have tried the gdb debugger as per Matthew's suggestion.
> > Just to note in case someone else is following this that at first it
> didn't work (couldn't 'attach') ,
> > but after some googling I found a tip suggesting the command;
> > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> > which seemed to get it working.
> >
> > I then first ran the debugger on the small matrix case that worked.
> > That stopped in gdb almost immediately after starting execution
> > with a report regarding 'nanosleep.c':
> > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> > However, issuing the 'cont' command again caused the program to run
> through to the end of the
> >  execution w/out any problems, and with correct looking results, so I am
> guessing this error
> > is not particularly important.
> >
> > We do that on purpose when the debugger starts up. Typing 'cont' is
> correct.
> >
> > I then tried the same debugging procedure on the large matrix case that
> fails.
> > The code again stopped almost immediately after the start of execution
> with
> > the same nanosleep error as before, and I was able to set the program
> running
> >  again with 'cont' (see full output below). I was running the code with
> 4 MPI processes,
> >  and so had 4 gdb windows appear.  Thereafter the code ran for sometime
> until completing the
> > matrix construction, and then one of the gdb process windows printed a
> > Program terminated with signal SIGKILL, Killed.
> > The program no longer exists.
> > message.  I then typed 'where' into this terminal but just received the
> message
> > No stack.
> >
> > I have only seen this behavior one other time, and it was with Fortran.
> Fortran allows you to declare really big arrays
> > on the stack by putting them at the start of a function (rather than F90
> malloc). When I had one of those arrays exceed
> > the stack space, I got this kind of an error where everything is
> destroyed rather than just stopping. Could it be that you
> > have a large structure on the stack?
> >
> > Second, you can at least look at the stack for the processes that were
> not killed. You type Ctrl-C, which should give you
> > the prompt and then "where".
> >
> >   Thanks,
> >
> >       Matt
> >
> > The other gdb windows basically seemed to be left in limbo until I
> issued the 'quit'
> >  command in the SIGKILL, and then they vanished.
> >
> > I paste the full output from the gdb window that recorded the SIGKILL
> below here.
> > I guess it is necessary to somehow work out where the SIGKILL originates
> from ?
> >
> >  Thanks once again,
> >                          Dan.
> >
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> > Copyright (C) 2020 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> > Type "show copying" and "show warranty" for details.
> > This GDB was configured as "x86_64-linux-gnu".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> >     <http://www.gnu.org/software/gdb/documentation/>.
> >
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from ./stab1.exe...
> > Attaching to program:
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe,
> process 675919
> > Reading symbols from
> /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> > Reading symbols from
> /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for
> more, q to quit, c to continue without paging--cont
> > /intel64_lin/libmkl_intel_lp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> > Reading symbols from
> /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> > Reading symbols from
> /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> > Reading symbols from
> /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> > (No debugging symbols found in
> /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> > Reading symbols from /lib64/ld-linux-x86-64.so.2...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>,
> clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0,
> rem=rem at entry=0x7ffdc641a9a0) at
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> > 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or
> directory.
> > (gdb) cont
> > Continuing.
> > [New Thread 0x7f9e49c02780 (LWP 676559)]
> > [New Thread 0x7f9e49400800 (LWP 676560)]
> > [New Thread 0x7f9e48bfe880 (LWP 676562)]
> > [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> > [Thread 0x7f9e49400800 (LWP 676560) exited]
> > [Thread 0x7f9e49c02780 (LWP 676559) exited]
> >
> > Program terminated with signal SIGKILL, Killed.
> > The program no longer exists.
> > (gdb) where
> > No stack.
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - -
> >
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Friday, August 20, 2021 2:12 PM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <
> sayosale at hotmail.com> wrote:
> > Dear Jose,
> >     Many thanks for your response, I have been investigating this issue
> with a few more calculations
> > today, hence the slightly delayed response.
> >
> > The problem is actually derived from a fluid dynamics problem, so to
> allow an easier exploration of things
> > I first downsized the resolution of the underlying fluid solver while
> keeping all the physical parameters
> >  the same - i.e. I would get a smaller matrix that should be solving the
> same physical problem as the original
> >  larger matrix but to lower accuracy.
> >
> > Results
> >
> > Small matrix (N= 21168) - everything good!
> > This converged when using the -eps_largest_real approach (taking 92
> iterations for nev=10,
> > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert
> approach, converging
> > very impressively in a single iteration ! Interestingly it did this both
> for a non-zero  -eps_target
> >  and also for a zero  -eps_target.
> >
> > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type
> sinvert
> > I have just double checked again that the code does run properly when we
> use the -eps_largest_real
> > option - indeed I ran it with a small nev and large tolerance (nev = 4,
> tol= -eps_tol 5.0e-4 , ncv = 300)
> > and with these parameters convergence was obtained in 164 iterations,
> which took 6 hours on the
> > machine I was running it on. Furthermore the eigenvalues seem to be
> ballpark correct; for this large
> > higher resolution case (although with lower slepc tolerance) we obtain
> 1789.56816314173 -4724.51319554773i
> >  as the eigenvalue with largest real part, while the smaller matrix
> (same physical problem but at lower resolution case)
> > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which
> means the agreement is in line
> > with expectations.
> >
> > Unfortunately though the code does still crash though when I try to do
> shift-invert for the large matrix case ,
> >  whether or not I use a non-zero  -eps_target. For reference this is the
> command line used :
> > -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1
> -st_type sinvert -eps_monitor :monitor_output05.txt
> > To be precise the code crashes soon after calling EPSSolve (it
> successfully calls
> >  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and
> EPSSetFromOptions).
> > By crashes I mean that I do not even get any error messages from
> slepc/PETSC, and do not even get the
> > 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran
> 'KILLED BY SIGNAL: 9 (Killed)' message
> >  as soon as EPSsolve is called.
> >
> > Hi Dan,
> >
> > It would help track this error down if we had a stack trace. You can get
> a stack trace from the debugger. You run with
> >
> >   -start_in_debugger
> >
> > which should launch the debugger (usually), and then type
> >
> >   cont
> >
> > to continue, and then
> >
> >   where
> >
> > to get the stack trace when it crashes, or 'bt' on lldb.
> >
> >   Thanks,
> >
> >      Matt
> >
> > Do you have any ideas as to why this larger matrix case should fail when
> using shift-invert but succeed when using
> > -eps_largest_real ? The fact that the program works and produces correct
> results
> > when using the -eps_largest_real  option suggests that there is probably
> nothing wrong with the specification
> > of the problem or the matrices ? It is strange how there is no error
> message from slepc / Petsc ... the
> > only idea I have at the moment is that perhaps max memory has been
> exceeded, which could cause such a sudden
> > shutdown? For your reference when running the large matrix case with the
> -eps_largest_real option I am using
> > about 36 GB of the 148GB available on this machine  - does the shift
> invert approach require substantially
> > more memory for example ?
> >
> >   I would be very grateful if you have any suggestions to resolve this
> issue or even ways to clarify it further,
> >  the performance I have seen with the shift-invert for the small matrix
> is so impressive it would be great to
> >  get that working for the full-size problem.
> >
> >    Many thanks and best wishes,
> >                                   Dan.
> >
> >
> >
> > From: Jose E. Roman <jroman at dsic.upv.es>
> > Sent: Thursday, August 19, 2021 7:58 AM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > In A) convergence may be slow, especially if the wanted eigenvalues have
> small magnitude. I would not say 600 iterations is a lot, you probably need
> many more. In most cases, approach B) is better because it improves
> convergence of eigenvalues close to the target, but it requires prior
> knowledge of your spectrum distribution in order to choose an appropriate
> target.
> >
> > In B) what do you mean that it crashes. If you get an error about
> factorization, it means that your A-matrix is singular, In that case, try
> using a nonzero target -eps_target 0.1
> >
> > Jose
> >
> >
> > > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com>
> escribió:
> > >
> > > Dear All,
> > >             I am planning on using slepc to do a large number of
> eigenvalue calculations
> > >  of a generalized eigenvalue problem, called from a program written in
> fortran using MPI.
> > >  Thus far I have successfully installed the slepc/PETSc software, both
> locally and on a cluster,
> > >  and on smaller test problems everything is working well; the matrices
> are efficiently and
> > > correctly constructed and slepc returns the correct spectrum. I am
> just now starting to move
> > > towards now solving the full-size 'production run' problems, and would
> appreciate some
> > > general advice on how to improve the solver's performance.
> > >
> > > In particular, I am currently trying to solve the problem Ax = lambda
> Bx whose matrices
> > > are of size 50000 (this is the smallest 'production run' problem I
> will be tackling), and are
> > > complex, non-Hermitian.  In most cases I aim to find the eigenvalues
> with the largest real part,
> > > although in other cases I will also be interested in finding the
> eigenvalues whose real part
> > > is close to zero.
> > >
> > > A)
> > > Calling slepc 's EPS solver with the following options:
> > >
> > > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140
> -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> > >
> > >
> > > led to the code successfully running, but failing to find any
> eigenvalues within the maximum 600 iterations
> > > (examining the monitor output it did appear to be very slowly
> approaching convergence).
> > >
> > > B)
> > > On the same problem I have also tried a shift-invert transformation
> using the options
> > >
> > > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> > >
> > > -in this case the code crashed at the point it tried to call slepc, so
> perhaps I have incorrectly specified these options ?
> > >
> > >
> > > Does anyone have any suggestions as to how to improve this performance
> ( or find out more about the problem) ?
> > > In the case of A) I can see from watching the slepc   videos that
> increasing ncv
> > > may help, but I am wondering , since 600 is a large number of
> iterations, whether there
> > > maybe something else going on - e.g. perhaps some alternative
> preconditioner may help ?
> > > In the case of B), I guess there must be some mistake in these command
> line options?
> > >  Again, any advice will be greatly appreciated.
> > >      Best wishes,  Dan.
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210826/b83cdad0/attachment-0001.html>


More information about the petsc-users mailing list