[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Thu Mar 11 08:03:25 CST 2021


Hi Pierre,

ok, that's interesting!

I will try to build a docker image until tomorrow and give you the exact 
recipe to reproduce the bugs.

Eric


On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>
>
>> On 11 Mar 2021, at 6:16 AM, Barry Smith <bsmith at petsc.dev 
>> <mailto:bsmith at petsc.dev>> wrote:
>>
>>
>>   Eric,
>>
>>    Sorry about not being more immediate. We still have this in our 
>> active email so you don't need to submit individual issues. We'll try 
>> to get to them as soon as we can.
>
> Indeed, I’m still trying to figure this out.
> I realized that some of my configure flags were different than yours, 
> e.g., no --with-memalign.
> I’ve also added SuperLU_DIST to my installation.
> Still, I can’t reproduce any issue.
> I will continue looking into this, it appears I’m seeing some valgrind 
> errors, but I don’t know if this is some side effect of OpenMPI not 
> being valgrind-clean (last time I checked, there was no error with MPICH).
>
> Thank you for your patience,
> Pierre
>
> /usr/bin/gmake -f gmakefile test test-fail=1
> Using MAKEFLAGS: test-fail=1
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>  ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>  ok ksp_ksp_tests-ex33_superlu_dist_2
>  ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>  ok ksp_ksp_tutorials-ex50_tut_2
>  ok diff-ksp_ksp_tutorials-ex50_tut_2
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>  ok ksp_ksp_tests-ex33_superlu_dist
>  ok diff-ksp_ksp_tests-ex33_superlu_dist
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>  ok snes_tutorials-ex56_hypre
>  ok diff-snes_tutorials-ex56_hypre
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>  ok ksp_ksp_tutorials-ex56_2
>  ok diff-ksp_ksp_tutorials-ex56_2
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>  ok snes_tutorials-ex17_3d_q3_trig_elas
>  ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>  ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
> #srun: error: Unable to create step for job 1426755: More processors 
> requested than permitted
>  ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command failed so no diff
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>  ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran required for 
> this test
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>  ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>  ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>  ok snes_tutorials-ex19_tut_3
>  ok diff-snes_tutorials-ex19_tut_3
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>  ok snes_tutorials-ex17_3d_q3_trig_vlap
>  ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>  ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran required for 
> this test
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>  ok snes_tutorials-ex19_superlu_dist
>  ok diff-snes_tutorials-ex19_superlu_dist
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>  ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>  ok diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>  ok ksp_ksp_tutorials-ex49_hypre_nullspace
>  ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>  ok snes_tutorials-ex19_superlu_dist_2
>  ok diff-snes_tutorials-ex19_superlu_dist_2
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
> #srun: error: Unable to create step for job 1426755: More processors 
> requested than permitted
>  ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command failed so no diff
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>  ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>  ok diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>  ok ksp_ksp_tutorials-ex64_1
>  ok diff-ksp_ksp_tutorials-ex64_1
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
> #srun: error: Unable to create step for job 1426755: More processors 
> requested than permitted
>  ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command failed so no diff
>         TEST 
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>  ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran required for 
> this test
>
>>    Barry
>>
>>
>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland 
>>> <Eric.Chamberland at giref.ulaval.ca 
>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>
>>> Barry,
>>>
>>> to get a some follow up on --with-openmp=1 failures, shall I open 
>>> gitlab issues for:
>>>
>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>
>>> b) all superlu_dist failures giving different results with initia 
>>> and "Exceeded timeout limit of 60 s"
>>>
>>> c) hpddm failures "free(): invalid next size (fast)" and 
>>> "Segmentation Violation"
>>>
>>> d) all tao's "Exceeded timeout limit of 60 s"
>>>
>>> I don't see how I could do all these debugging by myself...
>>>
>>> Thanks,
>>>
>>> Eric
>>>
>>>
>>
>
-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210311/c816422e/attachment.html>


More information about the petsc-dev mailing list