[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Pierre Jolivet pierre at joliv.et
Thu Mar 11 08:03:45 CST 2021


> On 11 Mar 2021, at 8:46 AM, Pierre Jolivet <pierre at joliv.et> wrote:
> 
>> On 11 Mar 2021, at 6:16 AM, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>>   Eric,
>> 
>>    Sorry about not being more immediate. We still have this in our active email so you don't need to submit individual issues. We'll try to get to them as soon as we can.
> 
> Indeed, I’m still trying to figure this out.
> I realized that some of my configure flags were different than yours, e.g., no --with-memalign.
> I’ve also added SuperLU_DIST to my installation.
> Still, I can’t reproduce any issue.
> I will continue looking into this, it appears I’m seeing some valgrind errors, but I don’t know if this is some side effect of OpenMPI not being valgrind-clean (last time I checked, there was no error with MPICH).

It looks like Valgrind + OpenMPI (+ OpenMP?) is complaining about uninitialized memory in PetscSFGetMultiSF().
Could you please try out the following branch https://gitlab.com/petsc/petsc/-/commits/jolivet/fix-valgrind-openmpi <https://gitlab.com/petsc/petsc/-/commits/jolivet/fix-valgrind-openmpi> ?
I’m not sure why there would be such a warning with OpenMPI and not with MPICH, and it is unlikely to fix anything, but for good measure, after compilation, could you please try:
$ make -f gmakefile test search='snes_tutorials-ex12_quad_hpddm_reuse_baij’

Thanks,
Pierre

> Thank you for your patience,
> Pierre
> 
> /usr/bin/gmake -f gmakefile test test-fail=1
> Using MAKEFLAGS: test-fail=1
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>  ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>  ok ksp_ksp_tests-ex33_superlu_dist_2
>  ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>  ok ksp_ksp_tutorials-ex50_tut_2
>  ok diff-ksp_ksp_tutorials-ex50_tut_2
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>  ok ksp_ksp_tests-ex33_superlu_dist
>  ok diff-ksp_ksp_tests-ex33_superlu_dist
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>  ok snes_tutorials-ex56_hypre
>  ok diff-snes_tutorials-ex56_hypre
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>  ok ksp_ksp_tutorials-ex56_2
>  ok diff-ksp_ksp_tutorials-ex56_2
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>  ok snes_tutorials-ex17_3d_q3_trig_elas
>  ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>  ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
> #	srun: error: Unable to create step for job 1426755: More processors requested than permitted
>  ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command failed so no diff
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>  ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran required for this test
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>  ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>  ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>  ok snes_tutorials-ex19_tut_3
>  ok diff-snes_tutorials-ex19_tut_3
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>  ok snes_tutorials-ex17_3d_q3_trig_vlap
>  ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>  ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran required for this test
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>  ok snes_tutorials-ex19_superlu_dist
>  ok diff-snes_tutorials-ex19_superlu_dist
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>  ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>  ok diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>  ok ksp_ksp_tutorials-ex49_hypre_nullspace
>  ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>  ok snes_tutorials-ex19_superlu_dist_2
>  ok diff-snes_tutorials-ex19_superlu_dist_2
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
> #	srun: error: Unable to create step for job 1426755: More processors requested than permitted
>  ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command failed so no diff
>         TEST arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>  ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>  ok diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>  ok ksp_ksp_tutorials-ex64_1
>  ok diff-ksp_ksp_tutorials-ex64_1
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
> #	srun: error: Unable to create step for job 1426755: More processors requested than permitted
>  ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command failed so no diff
>         TEST arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>  ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran required for this test
> 
>>    Barry
>> 
>> 
>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland <Eric.Chamberland at giref.ulaval.ca <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>> 
>>> Barry,
>>> 
>>> to get a some follow up on --with-openmp=1 failures, shall I open gitlab issues for:
>>> 
>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>> 
>>> b) all superlu_dist failures giving different results with initia and "Exceeded timeout limit of 60 s"
>>> 
>>> c) hpddm failures "free(): invalid next size (fast)" and "Segmentation Violation"
>>> 
>>> d) all tao's "Exceeded timeout limit of 60 s"
>>> 
>>> I don't see how I could do all these debugging by myself...
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210311/188cf590/attachment-0001.html>


More information about the petsc-dev mailing list