[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Tue Mar 30 22:18:41 CDT 2021


Hi Barry,

Here is what I have:

1. The hpddm issues have been all solved (you can't see no more hpddm 
failures here: 
https://giref.ulaval.ca/~cmpgiref/petsc-main-debug/2021.03.29.02h00m02s_make_test.log)

2. For Hypre, I think it is indeed not a bug but a feature, as far as I 
can see what has been told on the hypre discussion

list it is said "It still depends on the number of threads, that can’t 
be avoided" ( 
https://github.com/hypre-space/hypre/issues/303#issuecomment-800442755 )

and here 
https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing, 
into section 7.3, we have some interesting informations, as:

Figure 7.6 clearly illustrates that convergence degrades with the 
addition of threads for hybrid SGS;

....

The 3D sphere problem is the most extreme example because AMG-CG with 
hybrid SGS no longer converges with the addition of threading.

but I might have misunderstood since I am not an expert for that...

3. For SuperLU_Dist, I have tried to build SuperLU_dist out of PETSc to 
run the tests from superlu itself: sadly the bug is not showing up (see 
https://github.com/xiaoyeli/superlu_dist/issues/69).

I would like to build a reproducer superlu_dist example from what is 
done in the faulty test:

ksp_ksp_tutorials-ex5

that is buggy when called from PETSc: what bugs me, is that many other 
PETSc tests are running fine with superlu_dist: maybe something is 
uniquely done in ksp_ksp_tutorials-ex5 ?

So I think it worth digging into #3: the simple thing I have not yet 
done is retreiving the stack when it fails (timeout).

And a question: when you state that you upgraded to OpenMPI 4.1 you mean 
for one of your automated (docker?) compilation into the gitlab pipelines?

Thanks for taking news! :)

Eric


On 2021-03-30 1:47 p.m., Barry Smith wrote:
>
>   Eric,
>
>     How are things going on this OpenMP  front? Any bug fixes from 
> hypre or SuperLU_DIST?
>
>     BTW: we have upgraded to OpenMPI 4.1 perhaps this resolves some 
> issues?
>
>    Barry
>
>
>> On Mar 22, 2021, at 2:07 PM, Eric Chamberland 
>> <Eric.Chamberland at giref.ulaval.ca 
>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>
>> I added some information here:
>>
>> https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719
>>
>> Maybe someone can say more than I on what PETSc tries to do with the 
>> 2 mentioned tutorials that are timing out...
>>
>> Thanks,
>>
>> Eric
>>
>>
>> On 2021-03-15 11:31 a.m., Eric Chamberland wrote:
>>>
>>> Reported timeout bugs to SuperLU_dist too:
>>>
>>> https://github.com/xiaoyeli/superlu_dist/issues/69
>>>
>>> Eric
>>>
>>>
>>> On 2021-03-14 2:18 p.m., Eric Chamberland wrote:
>>>>
>>>> Done:
>>>>
>>>> https://github.com/hypre-space/hypre/issues/303
>>>>
>>>> Maybe I will need some help about PETSc to answer their questions...
>>>>
>>>> Eric
>>>>
>>>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
>>>>> Eric
>>>>>
>>>>> You should report these HYPRE issues upstream 
>>>>> https://github.com/hypre-space/hypre/issues 
>>>>> <https://github.com/hypre-space/hypre/issues>
>>>>>
>>>>>
>>>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland 
>>>>>> <Eric.Chamberland at giref.ulaval.ca 
>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>
>>>>>> For us it clearly creates problems in real computations...
>>>>>>
>>>>>> I understand the need to have clean test for PETSc, but for me, 
>>>>>> it reveals that hypre isn't usable with more than one thread for 
>>>>>> now...
>>>>>>
>>>>>> Another solution:  force single-threaded configuration for hypre 
>>>>>> until this is fixed?
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
>>>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
>>>>>>>   Linear solve did not converge due to DIVERGED_INDEFINITE_PC 
>>>>>>> iterations 3
>>>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
>>>>>>> OK, independently of the architecture it seems (Eric Docker 
>>>>>>> image with 1 or 2 threads or my macOS), but contraction factor 
>>>>>>> is higher
>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 8
>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 24
>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 26
>>>>>>> v. currently
>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 7
>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 9
>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 10
>>>>>>>
>>>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make 
>>>>>>> test?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Pierre
>>>>>>>
>>>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <mfadams at lbl.gov 
>>>>>>>> <mailto:mfadams at lbl.gov>> wrote:
>>>>>>>>
>>>>>>>> Hypre uses a multiplicative smoother by default. It has a 
>>>>>>>> chebyshev smoother. That with a Jacobi PC should be thread 
>>>>>>>> invariant.
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <pierre at joliv.et 
>>>>>>>> <mailto:pierre at joliv.et>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>     On 13 Mar 2021, at 9:17 AM, Pierre Jolivet
>>>>>>>>>     <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>>>
>>>>>>>>>     Hello Eric,
>>>>>>>>>     I’ve made an “interesting” discovery, so I’ll put back the
>>>>>>>>>     list in c/c.
>>>>>>>>>     It appears the following snippet of code which uses
>>>>>>>>>     Allreduce() + lambda function + MPI_IN_PLACE is:
>>>>>>>>>     - Valgrind-clean with MPICH;
>>>>>>>>>     - Valgrind-clean with OpenMPI 4.0.5;
>>>>>>>>>     - not Valgrind-clean with OpenMPI 4.1.0.
>>>>>>>>>     I’m not sure who is to blame here, I’ll need to look at
>>>>>>>>>     the MPI specification for what is required by the
>>>>>>>>>     implementors and users in that case.
>>>>>>>>>
>>>>>>>>>     In the meantime, I’ll do the following:
>>>>>>>>>     - update config/BuildSystem/config/packages/OpenMPI.py to
>>>>>>>>>     use OpenMPI 4.1.0, see if any other error appears;
>>>>>>>>>     - provide a hotfix to bypass the segfaults;
>>>>>>>>
>>>>>>>>     I can confirm that splitting the single Allreduce with my
>>>>>>>>     own MPI_Op into two Allreduce with MAX and BAND fixes the
>>>>>>>>     segfaults with OpenMPI (*).
>>>>>>>>
>>>>>>>>>     - look at the hypre issue and whether they should be
>>>>>>>>>     deferred to the hypre team.
>>>>>>>>
>>>>>>>>     I don’t know if there is something wrong in hypre threading
>>>>>>>>     or if it’s just a side effect of threading, but it seems
>>>>>>>>     that the number of threads has a drastic effect on the
>>>>>>>>     quality of the PC.
>>>>>>>>     By default, it looks that there are two threads per process
>>>>>>>>     with your Docker image.
>>>>>>>>     If I force OMP_NUM_THREADS=1, then I get the same
>>>>>>>>     convergence as in the output file.
>>>>>>>>
>>>>>>>>     Thanks,
>>>>>>>>     Pierre
>>>>>>>>
>>>>>>>>     (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
>>>>>>>>     <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
>>>>>>>>
>>>>>>>>>     Thank you for the Docker files, they were really useful.
>>>>>>>>>     If you want to avoid oversubscription failures, you can
>>>>>>>>>     edit the file
>>>>>>>>>     /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and append
>>>>>>>>>     the line:
>>>>>>>>>     localhost slots=12
>>>>>>>>>     If you want to increase the timeout limit of PETSc test
>>>>>>>>>     suite for each test, you can add the extra flag in your
>>>>>>>>>     command line TIMEOUT=180 (default is 60, units are seconds).
>>>>>>>>>
>>>>>>>>>     Thanks, I’ll ping you on GitLab when I’ve got something
>>>>>>>>>     ready for you to try,
>>>>>>>>>     Pierre
>>>>>>>>>
>>>>>>>>>     <ompi.cxx>
>>>>>>>>>
>>>>>>>>>>     On 12 Mar 2021, at 8:54 PM, Eric Chamberland
>>>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi Pierre,
>>>>>>>>>>
>>>>>>>>>>     I now have a docker container reproducing the problems here.
>>>>>>>>>>
>>>>>>>>>>     Actually, if I look at
>>>>>>>>>>     snes_tutorials-ex12_quad_singular_hpddm it fails like this:
>>>>>>>>>>
>>>>>>>>>>     not ok snes_tutorials-ex12_quad_singular_hpddm # Error
>>>>>>>>>>     code: 59
>>>>>>>>>>     # Initial guess
>>>>>>>>>>     #       L_2 Error: 0.00803099
>>>>>>>>>>     # Initial Residual
>>>>>>>>>>     #       L_2 Residual: 1.09057
>>>>>>>>>>     #       Au - b = Au + F(0)
>>>>>>>>>>     #       Linear L_2 Residual: 1.09057
>>>>>>>>>>     # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
>>>>>>>>>>     # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
>>>>>>>>>>     # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
>>>>>>>>>>     # [3]PETSC ERROR:
>>>>>>>>>>     ------------------------------------------------------------------------
>>>>>>>>>>     # [3]PETSC ERROR: Caught signal number 11 SEGV:
>>>>>>>>>>     Segmentation Violation, probably memory access out of range
>>>>>>>>>>     # [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>>     -on_error_attach_debugger
>>>>>>>>>>     # [3]PETSC ERROR: or see
>>>>>>>>>>     https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>>>>>>>     <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>>>>>>>>>     # [3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>>     <http://valgrind.org/> on GNU/linux and Apple Mac OS X to
>>>>>>>>>>     find memory corruption errors
>>>>>>>>>>     # [3]PETSC ERROR: likely location of problem given in
>>>>>>>>>>     stack below
>>>>>>>>>>     # [3]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>>>     ------------------------------------
>>>>>>>>>>     # [3]PETSC ERROR: Note: The EXACT line numbers in the
>>>>>>>>>>     stack are not available,
>>>>>>>>>>     # [3]PETSC ERROR: INSTEAD the line number of the start of
>>>>>>>>>>     the function
>>>>>>>>>>     # [3]PETSC ERROR: is given.
>>>>>>>>>>     # [3]PETSC ERROR: [3] buildTwo line 987
>>>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>>     # [3]PETSC ERROR: [3] next line 1130
>>>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>>     # [3]PETSC ERROR: --------------------- Error Message
>>>>>>>>>>     --------------------------------------------------------------
>>>>>>>>>>     # [3]PETSC ERROR: Signal received
>>>>>>>>>>     # [3]PETSC ERROR: [0]PETSC ERROR:
>>>>>>>>>>     ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>     also ex12_quad_hpddm_reuse_baij fails with a lot more
>>>>>>>>>>     "Read -1, expected ..." which I don't know where they
>>>>>>>>>>     come from...?
>>>>>>>>>>
>>>>>>>>>>     Hypre (like in diff-snes_tutorials-ex56_hypre) is also
>>>>>>>>>>     having DIVERGED_INDEFINITE_PC failures...
>>>>>>>>>>
>>>>>>>>>>     Please see the 3 attached docker files:
>>>>>>>>>>
>>>>>>>>>>     1) fedora_mkl_and_devtools : the DockerFile which install
>>>>>>>>>>     fedore 33 with gnu compilers and MKL and everything to
>>>>>>>>>>     develop.
>>>>>>>>>>
>>>>>>>>>>     2) openmpi: the DockerFile to bluid OpenMPI
>>>>>>>>>>
>>>>>>>>>>     3) petsc: The las DockerFile that build/install and test
>>>>>>>>>>     PETSc
>>>>>>>>>>
>>>>>>>>>>     I build the 3 like this:
>>>>>>>>>>
>>>>>>>>>>     docker build -t fedora_mkl_and_devtools -f
>>>>>>>>>>     fedora_mkl_and_devtools .
>>>>>>>>>>
>>>>>>>>>>     docker build -t openmpi -f openmpi .
>>>>>>>>>>
>>>>>>>>>>     docker build -t petsc -f petsc .
>>>>>>>>>>
>>>>>>>>>>     Disclaimer: I am not a docker expert, so I may do things
>>>>>>>>>>     that are not docker-stat-of-the-art but I am opened to
>>>>>>>>>>     suggestions... ;)
>>>>>>>>>>
>>>>>>>>>>     I have just ran it on my portable (long) which have not
>>>>>>>>>>     enough cores, so many more tests failed (should force
>>>>>>>>>>     --oversubscribe but don't know how to).  I will relaunch
>>>>>>>>>>     on my workstation in a few minutes.
>>>>>>>>>>
>>>>>>>>>>     I will now test your branch! (sorry for the delay).
>>>>>>>>>>
>>>>>>>>>>     Thanks,
>>>>>>>>>>
>>>>>>>>>>     Eric
>>>>>>>>>>
>>>>>>>>>>     On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi Pierre,
>>>>>>>>>>>
>>>>>>>>>>>     ok, that's interesting!
>>>>>>>>>>>
>>>>>>>>>>>     I will try to build a docker image until tomorrow and
>>>>>>>>>>>     give you the exact recipe to reproduce the bugs.
>>>>>>>>>>>
>>>>>>>>>>>     Eric
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>     On 11 Mar 2021, at 6:16 AM, Barry Smith
>>>>>>>>>>>>>     <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>       Eric,
>>>>>>>>>>>>>
>>>>>>>>>>>>>      Sorry about not being more immediate. We still have
>>>>>>>>>>>>>     this in our active email so you don't need to submit
>>>>>>>>>>>>>     individual issues. We'll try to get to them as soon as
>>>>>>>>>>>>>     we can.
>>>>>>>>>>>>
>>>>>>>>>>>>     Indeed, I’m still trying to figure this out.
>>>>>>>>>>>>     I realized that some of my configure flags were
>>>>>>>>>>>>     different than yours, e.g., no --with-memalign.
>>>>>>>>>>>>     I’ve also added SuperLU_DIST to my installation.
>>>>>>>>>>>>     Still, I can’t reproduce any issue.
>>>>>>>>>>>>     I will continue looking into this, it appears I’m
>>>>>>>>>>>>     seeing some valgrind errors, but I don’t know if this
>>>>>>>>>>>>     is some side effect of OpenMPI not being valgrind-clean
>>>>>>>>>>>>     (last time I checked, there was no error with MPICH).
>>>>>>>>>>>>
>>>>>>>>>>>>     Thank you for your patience,
>>>>>>>>>>>>     Pierre
>>>>>>>>>>>>
>>>>>>>>>>>>     /usr/bin/gmake -f gmakefile test test-fail=1
>>>>>>>>>>>>     Using MAKEFLAGS: test-fail=1
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>>      ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>>>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>>>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>>>>>>>>>>      ok snes_tutorials-ex56_hypre
>>>>>>>>>>>>      ok diff-snes_tutorials-ex56_hypre
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
>>>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
>>>>>>>>>>>>     #srun: error: Unable to create step for job 1426755:
>>>>>>>>>>>>     More processors requested than permitted
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
>>>>>>>>>>>>     failed so no diff
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
>>>>>>>>>>>>     required for this test
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>>>>>>>>>>      ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>>      ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>>>>>>>>>>      ok snes_tutorials-ex19_tut_3
>>>>>>>>>>>>      ok diff-snes_tutorials-ex19_tut_3
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP
>>>>>>>>>>>>     Fortran required for this test
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>>>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>>>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
>>>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
>>>>>>>>>>>>     #srun: error: Unable to create step for job 1426755:
>>>>>>>>>>>>     More processors requested than permitted
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
>>>>>>>>>>>>     failed so no diff
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>>      ok
>>>>>>>>>>>>     diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
>>>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
>>>>>>>>>>>>     #srun: error: Unable to create step for job 1426755:
>>>>>>>>>>>>     More processors requested than permitted
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
>>>>>>>>>>>>     failed so no diff
>>>>>>>>>>>>           TEST
>>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP
>>>>>>>>>>>>     Fortran required for this test
>>>>>>>>>>>>
>>>>>>>>>>>>>      Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>     On Mar 10, 2021, at 11:03 PM, Eric Chamberland
>>>>>>>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Barry,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     to get a some follow up on --with-openmp=1 failures,
>>>>>>>>>>>>>>     shall I open gitlab issues for:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     b) all superlu_dist failures giving different results
>>>>>>>>>>>>>>     with initia and "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     c) hpddm failures "free(): invalid next size (fast)"
>>>>>>>>>>>>>>     and "Segmentation Violation"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     d) all tao's "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     I don't see how I could do all these debugging by
>>>>>>>>>>>>>>     myself...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Eric
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>     -- 
>>>>>>>>>>>     Eric Chamberland, ing., M. Ing
>>>>>>>>>>>     Professionnel de recherche
>>>>>>>>>>>     GIREF/Université Laval
>>>>>>>>>>>     (418) 656-2131 poste 41 22 42
>>>>>>>>>>     -- 
>>>>>>>>>>     Eric Chamberland, ing., M. Ing
>>>>>>>>>>     Professionnel de recherche
>>>>>>>>>>     GIREF/Université Laval
>>>>>>>>>>     (418) 656-2131 poste 41 22 42
>>>>>>>>>>     <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>> Eric Chamberland, ing., M. Ing
>>>>>> Professionnel de recherche
>>>>>> GIREF/Université Laval
>>>>>> (418) 656-2131 poste 41 22 42
>>>>>
>>>> -- 
>>>> Eric Chamberland, ing., M. Ing
>>>> Professionnel de recherche
>>>> GIREF/Université Laval
>>>> (418) 656-2131 poste 41 22 42
>>> -- 
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>> -- 
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Université Laval
>> (418) 656-2131 poste 41 22 42
>
-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210330/853617cf/attachment-0001.html>


More information about the petsc-dev mailing list