[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Jed Brown jed at jedbrown.org
Thu Mar 18 22:51:10 CDT 2021


Note that this is specific to the node numbering, and that node numbering tends to produce poor results even for MatMult due to poor cache reuse of the vector. It's good practice after partitioning to use a locality-preserving ordering of dofs on a process (e.g., RCM if you use MatOrdering). This was shown in the PETSc-FUN3D papers circa 1999 and has been confirmed multiple times over the years by various members of this list (including me). I believe FEniCS and libMesh now do this by default (or at least have an option) and it was shown to perform better. It's a notable weakness of DMPlex that it does not apply such an ordering of dofs and I've complained to Matt about it many times over the years, but any blame rests solely with me for not carving out time to implement it here.

Better SGS/SOR smoothing factors with simple OpenMP partitioning is an additional bonus, though I'm not a fan of using OpenMP in this way.

Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> writes:

> Hi,
>
> For the knowledge of readers, I just read section 7.3 here:
>
> https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing
>
> And it is explained why multi-threading gives a poor result with the 
> Hybrid−SGS smoother...
>
> Eric
>
>
> On 2021-03-15 2:50 p.m., Barry Smith wrote:
>>
>>    I posted some information at the issue.
>>
>>    IMHO it is likely a bug in one or more of hypre's smoothers that 
>> use OpenMP. We have never tested them before (and likely hypre has not 
>> tested all the combinations) and so would not have seen the bug. 
>> Hopefully they can just fix it.
>>
>>    Barry
>>
>>     I got the problem to occur with ex56 with 2 MPI ranks and 4 OpenMP 
>> threads, if I used less than 4 threads it did not generate an 
>> indefinite preconditioner.
>>
>>
>>> On Mar 14, 2021, at 1:18 PM, Eric Chamberland 
>>> <Eric.Chamberland at giref.ulaval.ca 
>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>
>>> Done:
>>>
>>> https://github.com/hypre-space/hypre/issues/303
>>>
>>> Maybe I will need some help about PETSc to answer their questions...
>>>
>>> Eric
>>>
>>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
>>>> Eric
>>>>
>>>> You should report these HYPRE issues upstream 
>>>> https://github.com/hypre-space/hypre/issues 
>>>> <https://github.com/hypre-space/hypre/issues>
>>>>
>>>>
>>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland 
>>>>> <Eric.Chamberland at giref.ulaval.ca 
>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>
>>>>> For us it clearly creates problems in real computations...
>>>>>
>>>>> I understand the need to have clean test for PETSc, but for me, it 
>>>>> reveals that hypre isn't usable with more than one thread for now...
>>>>>
>>>>> Another solution:  force single-threaded configuration for hypre 
>>>>> until this is fixed?
>>>>>
>>>>> Eric
>>>>>
>>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
>>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
>>>>>>   Linear solve did not converge due to DIVERGED_INDEFINITE_PC 
>>>>>> iterations 3
>>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
>>>>>> OK, independently of the architecture it seems (Eric Docker image 
>>>>>> with 1 or 2 threads or my macOS), but contraction factor is higher
>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 8
>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 24
>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 26
>>>>>> v. currently
>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 7
>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 9
>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 10
>>>>>>
>>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make test?
>>>>>>
>>>>>> Thanks,
>>>>>> Pierre
>>>>>>
>>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <mfadams at lbl.gov 
>>>>>>> <mailto:mfadams at lbl.gov>> wrote:
>>>>>>>
>>>>>>> Hypre uses a multiplicative smoother by default. It has a 
>>>>>>> chebyshev smoother. That with a Jacobi PC should be thread 
>>>>>>> invariant.
>>>>>>> Mark
>>>>>>>
>>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <pierre at joliv.et 
>>>>>>> <mailto:pierre at joliv.et>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>     On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <pierre at joliv.et
>>>>>>>>     <mailto:pierre at joliv.et>> wrote:
>>>>>>>>
>>>>>>>>     Hello Eric,
>>>>>>>>     I’ve made an “interesting” discovery, so I’ll put back the
>>>>>>>>     list in c/c.
>>>>>>>>     It appears the following snippet of code which uses
>>>>>>>>     Allreduce() + lambda function + MPI_IN_PLACE is:
>>>>>>>>     - Valgrind-clean with MPICH;
>>>>>>>>     - Valgrind-clean with OpenMPI 4.0.5;
>>>>>>>>     - not Valgrind-clean with OpenMPI 4.1.0.
>>>>>>>>     I’m not sure who is to blame here, I’ll need to look at the
>>>>>>>>     MPI specification for what is required by the implementors
>>>>>>>>     and users in that case.
>>>>>>>>
>>>>>>>>     In the meantime, I’ll do the following:
>>>>>>>>     - update config/BuildSystem/config/packages/OpenMPI.py to
>>>>>>>>     use OpenMPI 4.1.0, see if any other error appears;
>>>>>>>>     - provide a hotfix to bypass the segfaults;
>>>>>>>
>>>>>>>     I can confirm that splitting the single Allreduce with my own
>>>>>>>     MPI_Op into two Allreduce with MAX and BAND fixes the
>>>>>>>     segfaults with OpenMPI (*).
>>>>>>>
>>>>>>>>     - look at the hypre issue and whether they should be
>>>>>>>>     deferred to the hypre team.
>>>>>>>
>>>>>>>     I don’t know if there is something wrong in hypre threading
>>>>>>>     or if it’s just a side effect of threading, but it seems that
>>>>>>>     the number of threads has a drastic effect on the quality of
>>>>>>>     the PC.
>>>>>>>     By default, it looks that there are two threads per process
>>>>>>>     with your Docker image.
>>>>>>>     If I force OMP_NUM_THREADS=1, then I get the same convergence
>>>>>>>     as in the output file.
>>>>>>>
>>>>>>>     Thanks,
>>>>>>>     Pierre
>>>>>>>
>>>>>>>     (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
>>>>>>>     <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
>>>>>>>
>>>>>>>>     Thank you for the Docker files, they were really useful.
>>>>>>>>     If you want to avoid oversubscription failures, you can edit
>>>>>>>>     the file /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and
>>>>>>>>     append the line:
>>>>>>>>     localhost slots=12
>>>>>>>>     If you want to increase the timeout limit of PETSc test
>>>>>>>>     suite for each test, you can add the extra flag in your
>>>>>>>>     command line TIMEOUT=180 (default is 60, units are seconds).
>>>>>>>>
>>>>>>>>     Thanks, I’ll ping you on GitLab when I’ve got something
>>>>>>>>     ready for you to try,
>>>>>>>>     Pierre
>>>>>>>>
>>>>>>>>     <ompi.cxx>
>>>>>>>>
>>>>>>>>>     On 12 Mar 2021, at 8:54 PM, Eric Chamberland
>>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>
>>>>>>>>>     Hi Pierre,
>>>>>>>>>
>>>>>>>>>     I now have a docker container reproducing the problems here.
>>>>>>>>>
>>>>>>>>>     Actually, if I look at
>>>>>>>>>     snes_tutorials-ex12_quad_singular_hpddm it fails like this:
>>>>>>>>>
>>>>>>>>>     not ok snes_tutorials-ex12_quad_singular_hpddm # Error code: 59
>>>>>>>>>     # Initial guess
>>>>>>>>>     #       L_2 Error: 0.00803099
>>>>>>>>>     # Initial Residual
>>>>>>>>>     #       L_2 Residual: 1.09057
>>>>>>>>>     #       Au - b = Au + F(0)
>>>>>>>>>     #       Linear L_2 Residual: 1.09057
>>>>>>>>>     # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
>>>>>>>>>     # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
>>>>>>>>>     # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
>>>>>>>>>     # [3]PETSC ERROR:
>>>>>>>>>     ------------------------------------------------------------------------
>>>>>>>>>     # [3]PETSC ERROR: Caught signal number 11 SEGV:
>>>>>>>>>     Segmentation Violation, probably memory access out of range
>>>>>>>>>     # [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>     -on_error_attach_debugger
>>>>>>>>>     # [3]PETSC ERROR: or see
>>>>>>>>>     https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>>>>>>     <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>>>>>>>>     # [3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>     <http://valgrind.org/> on GNU/linux and Apple Mac OS X to
>>>>>>>>>     find memory corruption errors
>>>>>>>>>     # [3]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>     below
>>>>>>>>>     # [3]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>>     ------------------------------------
>>>>>>>>>     # [3]PETSC ERROR: Note: The EXACT line numbers in the stack
>>>>>>>>>     are not available,
>>>>>>>>>     # [3]PETSC ERROR: INSTEAD the line number of the start of
>>>>>>>>>     the function
>>>>>>>>>     # [3]PETSC ERROR: is given.
>>>>>>>>>     # [3]PETSC ERROR: [3] buildTwo line 987
>>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>     # [3]PETSC ERROR: [3] next line 1130
>>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>     # [3]PETSC ERROR: --------------------- Error Message
>>>>>>>>>     --------------------------------------------------------------
>>>>>>>>>     # [3]PETSC ERROR: Signal received
>>>>>>>>>     # [3]PETSC ERROR: [0]PETSC ERROR:
>>>>>>>>>     ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>     also ex12_quad_hpddm_reuse_baij fails with a lot more "Read
>>>>>>>>>     -1, expected ..." which I don't know where they come from...?
>>>>>>>>>
>>>>>>>>>     Hypre (like in diff-snes_tutorials-ex56_hypre) is also
>>>>>>>>>     having DIVERGED_INDEFINITE_PC failures...
>>>>>>>>>
>>>>>>>>>     Please see the 3 attached docker files:
>>>>>>>>>
>>>>>>>>>     1) fedora_mkl_and_devtools : the DockerFile which install
>>>>>>>>>     fedore 33 with gnu compilers and MKL and everything to develop.
>>>>>>>>>
>>>>>>>>>     2) openmpi: the DockerFile to bluid OpenMPI
>>>>>>>>>
>>>>>>>>>     3) petsc: The las DockerFile that build/install and test PETSc
>>>>>>>>>
>>>>>>>>>     I build the 3 like this:
>>>>>>>>>
>>>>>>>>>     docker build -t fedora_mkl_and_devtools -f
>>>>>>>>>     fedora_mkl_and_devtools .
>>>>>>>>>
>>>>>>>>>     docker build -t openmpi -f openmpi .
>>>>>>>>>
>>>>>>>>>     docker build -t petsc -f petsc .
>>>>>>>>>
>>>>>>>>>     Disclaimer: I am not a docker expert, so I may do things
>>>>>>>>>     that are not docker-stat-of-the-art but I am opened to
>>>>>>>>>     suggestions... ;)
>>>>>>>>>
>>>>>>>>>     I have just ran it on my portable (long) which have not
>>>>>>>>>     enough cores, so many more tests failed (should force
>>>>>>>>>     --oversubscribe but don't know how to).  I will relaunch on
>>>>>>>>>     my workstation in a few minutes.
>>>>>>>>>
>>>>>>>>>     I will now test your branch! (sorry for the delay).
>>>>>>>>>
>>>>>>>>>     Thanks,
>>>>>>>>>
>>>>>>>>>     Eric
>>>>>>>>>
>>>>>>>>>     On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi Pierre,
>>>>>>>>>>
>>>>>>>>>>     ok, that's interesting!
>>>>>>>>>>
>>>>>>>>>>     I will try to build a docker image until tomorrow and give
>>>>>>>>>>     you the exact recipe to reproduce the bugs.
>>>>>>>>>>
>>>>>>>>>>     Eric
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>     On 11 Mar 2021, at 6:16 AM, Barry Smith
>>>>>>>>>>>>     <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>       Eric,
>>>>>>>>>>>>
>>>>>>>>>>>>      Sorry about not being more immediate. We still have
>>>>>>>>>>>>     this in our active email so you don't need to submit
>>>>>>>>>>>>     individual issues. We'll try to get to them as soon as
>>>>>>>>>>>>     we can.
>>>>>>>>>>>
>>>>>>>>>>>     Indeed, I’m still trying to figure this out.
>>>>>>>>>>>     I realized that some of my configure flags were different
>>>>>>>>>>>     than yours, e.g., no --with-memalign.
>>>>>>>>>>>     I’ve also added SuperLU_DIST to my installation.
>>>>>>>>>>>     Still, I can’t reproduce any issue.
>>>>>>>>>>>     I will continue looking into this, it appears I’m seeing
>>>>>>>>>>>     some valgrind errors, but I don’t know if this is some
>>>>>>>>>>>     side effect of OpenMPI not being valgrind-clean (last
>>>>>>>>>>>     time I checked, there was no error with MPICH).
>>>>>>>>>>>
>>>>>>>>>>>     Thank you for your patience,
>>>>>>>>>>>     Pierre
>>>>>>>>>>>
>>>>>>>>>>>     /usr/bin/gmake -f gmakefile test test-fail=1
>>>>>>>>>>>     Using MAKEFLAGS: test-fail=1
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>      ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>>>>>>>>>      ok snes_tutorials-ex56_hypre
>>>>>>>>>>>      ok diff-snes_tutorials-ex56_hypre
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>      ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
>>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
>>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>>     processors requested than permitted
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
>>>>>>>>>>>     failed so no diff
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
>>>>>>>>>>>     required for this test
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>>>>>>>>>      ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>      ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>>>>>>>>>      ok snes_tutorials-ex19_tut_3
>>>>>>>>>>>      ok diff-snes_tutorials-ex19_tut_3
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran
>>>>>>>>>>>     required for this test
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>>>>>>>>>      ok
>>>>>>>>>>>     snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>      ok
>>>>>>>>>>>     diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
>>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
>>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>>     processors requested than permitted
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
>>>>>>>>>>>     failed so no diff
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>>>>>>>>>      ok
>>>>>>>>>>>     snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>      ok
>>>>>>>>>>>     diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
>>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
>>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>>     processors requested than permitted
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
>>>>>>>>>>>     failed so no diff
>>>>>>>>>>>           TEST
>>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran
>>>>>>>>>>>     required for this test
>>>>>>>>>>>
>>>>>>>>>>>>      Barry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>     On Mar 10, 2021, at 11:03 PM, Eric Chamberland
>>>>>>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Barry,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     to get a some follow up on --with-openmp=1 failures,
>>>>>>>>>>>>>     shall I open gitlab issues for:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>>>>>>>>>>>
>>>>>>>>>>>>>     b) all superlu_dist failures giving different results
>>>>>>>>>>>>>     with initia and "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     c) hpddm failures "free(): invalid next size (fast)"
>>>>>>>>>>>>>     and "Segmentation Violation"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     d) all tao's "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     I don't see how I could do all these debugging by myself...
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Eric
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>     -- 
>>>>>>>>>>     Eric Chamberland, ing., M. Ing
>>>>>>>>>>     Professionnel de recherche
>>>>>>>>>>     GIREF/Université Laval
>>>>>>>>>>     (418) 656-2131 poste 41 22 42
>>>>>>>>>     -- 
>>>>>>>>>     Eric Chamberland, ing., M. Ing
>>>>>>>>>     Professionnel de recherche
>>>>>>>>>     GIREF/Université Laval
>>>>>>>>>     (418) 656-2131 poste 41 22 42
>>>>>>>>>     <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> -- 
>>>>> Eric Chamberland, ing., M. Ing
>>>>> Professionnel de recherche
>>>>> GIREF/Université Laval
>>>>> (418) 656-2131 poste 41 22 42
>>>>
>>> -- 
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>>
> -- 
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42


More information about the petsc-dev mailing list