[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Barry Smith bsmith at petsc.dev
Tue Mar 2 14:47:23 CST 2021


  Eric,

    Thanks for the detailed information.   

    I have cc:ed Pierre so he can look at the HPDDM failures. 


> On Mar 2, 2021, at 2:14 PM, Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> wrote:
> 
> Hi,
> 
> It all started when I wanted to test PETSC/CUDA compatibility for our code.
> 
> I had to activate --with-openmp to configure with --with-cuda=1 successfully.
> 
> 
Certain packages like SuperLU_DIST require --with-openmp  if using --with-cuda=1 but PETSc's own use of CUDA as well as some other packages do not require the --with-openmp. 

> I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and some other places).
> 
> So, I configured and tested petsc with openmp activated, without CUDA.
> 
> The first thing I see is that our code CI pipelines now fails for many tests.
> 
> After looking deeper, it seems that PETSc itself fails many tests when I activate openmp!
> 
> Here are all the configurations I have results for, after/before activating OpenMP for PETSc:

There seem to be several distinct issues

1) failures inside Scalapack.  

2) possibly slightly different convergence rates for some examples changing the number of iterations slightly in PETSc.

3) trouble initializing something outside of PETSc, almost for sure not related to PETSc 

[zorg:08517] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
#	[zorg:08517] plm:base:set_hnp_name: initial bias 8517 nodename hash 810220270
#	[zorg:08517] plm:base:set_hnp_name: final jobfam 60385
#	[zorg:08517] [[60385,0],0] plm:rsh_setup on agent ssh : rsh path NULL
#	[zorg:08517] [[60385,0],0] plm:base:receive start comm
#	[zorg:08517] [[60385,0],0] plm:base:setup_job
#	[zorg:08517] [[60385,0],0] plm:base:setup_vm
4) problem with a hypre run Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations 3 , again not likely a PETSc issue but a hypre and OpenMP issue

5) Different results for initia inside an external package 

#	1c1
#	<  MatInertia: nneg: 17, nzero: 0, npos: 83
#	---
#	>  MatInertia: nneg: 21, nzero: 0, npos: 79
        TEST arch-linux-c-debug/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
 ok ksp_ksp_tests-ex33_superlu_dist_2
not ok diff-ksp_ksp_tests-ex33_superlu_dist_2 # Error code: 1
#	1c1
#	<  MatInertia: nneg: 17, nzero: 0, npos: 83
#	---
#	>  MatInertia: nneg: 25, nzero: 0, npos: 75


6) problems with the external package hpddm 

not ok snes_tutorials-ex12_quad_hpddm_reuse_baij # Error code: 139
#	  0 SNES Function norm 21.3344 
#	[0]PETSC ERROR: ------------------------------------------------------------------------
#	[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
#	[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
#	[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
#	[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
#	[0]PETSC ERROR: likely location of problem given in stack below
#	[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
#	[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
#	[0]PETSC ERROR:       INSTEAD the line number of the start of the function
#	[0]PETSC ERROR:       is given.
#	[0]PETSC ERROR: [0] constructionMatrix line 313 /opt/petsc-main_debug/include/HPDDM_coarse_operator_impl.hpp
#	[0]PETSC ERROR: [0] construction line 256 /opt/petsc-main_debug/include/HPDDM_coarse_operator_impl.hpp
#	[0]PETSC ERROR: [0] buildTwo line 987 /opt/petsc-main_debug/include/HPDDM_schwarz.hpp
#	[0]PETSC ERROR: [0] next line 1130 /opt/petsc-main_debug/include/HPDDM_schwarz.hpp
#	[0]PETSC ERROR: [0] PCSetUp_HPDDM line 746 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/pc/impls/hpddm/hpddm.cxx
#	[0]PETSC ERROR: [0] PCSetUp line 974 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/pc/interface/precon.c
#	[0]PETSC ERROR: [0] KSPSetUp line 319 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/ksp/interface/itfunc.c
#	[0]PETSC ERROR: [0] KSPSolve_Private line 808 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/ksp/interface/itfunc.c
#	[0]PETSC ERROR: [0] KSPSolve line 1080 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/ksp/interface/itfunc.c
#	[0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/snes/impls/ls/ls.c
#	[0]PETSC ERROR: [0] SNESSolve line 4533 /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/snes/interface/snes.c
#	[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
#	[0]PETSC ERROR: Signal received
#	[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.

PETSc itself does not use OpenMP so turning on OpenMP for pure PETSc should generate no errors except possibly small changes in iteration rates due to the different way the floating point operations in MKL are done.

We don't see much use for OpenMP so rarely turn it on. What is your end goal, to use PETSc on CUDA (for each you can keep OpenMP off) or something else?


  Barry


> ==============================================================================
> 
> ==============================================================================
> 
> For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:
> 
> With OpenMP=1
> 
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log>
> # -------------
> #   Summary    
> # -------------
> # FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij diff-ksp_ksp_tests-ex33_superlu_dist_2 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist_2 diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2
> # success 8275/10003 tests (82.7%)
> # failed 33/10003 tests (0.3%)
> With OpenMP=0
> 
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log>
> # -------------
> #   Summary    
> # -------------
> # FAILED tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_5
> # success 8262/9983 tests (82.8%)
> # failed 6/9983 tests (0.1%)
> ==============================================================================
> 
> ==============================================================================
> 
> For OpenMPI 3.1.x/master:
> 
> With OpenMP=1:
> 
> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log>
> # -------------
> #   Summary    
> # -------------
> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 diff-ksp_ksp_tutorials-ex49_hypre_nullspace ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_4 tao_constrained_tutorials-tomographyADMM_6 diff-tao_constrained_tutorials-toyf_1
> # success 8142/9765 tests (83.4%)
> # failed 16/9765 tests (0.2%)
> With OpenMP=0:
> 
> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log>
> # -------------
> #   Summary    
> # -------------
> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex56_2 snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_4 diff-tao_constrained_tutorials-toyf_1
> # success 8151/9767 tests (83.5%)
> # failed 9/9767 tests (0.1%)
> ==============================================================================
> 
> ==============================================================================
> 
> For OpenMPI 4.0.x/master:
> 
> With OpenMP=1:
> 
> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log>
> # FAILED snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex19_hypre ksp_ksp_tutorials-ex56_2 tao_leastsquares_tutorials-tomography_1 tao_constrained_tutorials-tomographyADMM_5 mat_tests-ex242_3 ksp_ksp_tutorials-ex55_hypre ksp_ksp_tutorials-ex5_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex56_hypre snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre ksp_ksp_tutorials-ex5f_superlu_dist_3 ksp_ksp_tutorials-ex34_hyprestruct diff-ksp_ksp_tutorials-ex49_hypre_nullspace snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre ksp_ksp_tutorials-ex5f_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist snes_tutorials-ex19_tut_3 snes_tutorials-ex19_superlu_dist ksp_ksp_tutorials-ex50_tut_2 snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5_superlu_dist_3 snes_tutorials-ex19_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_4 ts_tutorials-ex26_2
> # success 8125/9753 tests (83.3%)
> # failed 26/9753 tests (0.3%)
> With OpenMP=0
> 
> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log>
> # FAILED mat_tests-ex242_3
> # success 8174/9777 tests (83.6%)
> # failed 1/9777 tests (0.0%)
> 
> ==============================================================================
> 
> ==============================================================================
> 
> Is that known and normal?
> 
> In all cases, I am using MKL and I suspect it  may come from there... :/
> 
> I also saw a second problem, "make test" fails to compile petsc examples on older versions of MKL (but that's less important for me, I just upgraded to OneAPI to avoid this, but you may want to know):
> 
> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log <https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log>
> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log <https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log>
> Thanks,
> 
> Eric
> 
> -- 
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210302/068b57b4/attachment-0001.html>


More information about the petsc-dev mailing list