[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Fri Mar 19 07:44:20 CDT 2021

On Thu, Mar 18, 2021 at 11:51 PM Jed Brown <jed at jedbrown.org> wrote:

> Note that this is specific to the node numbering, and that node numbering
> tends to produce poor results even for MatMult due to poor cache reuse of
> the vector. It's good practice after partitioning to use a
> locality-preserving ordering of dofs on a process (e.g., RCM if you use
> MatOrdering). This was shown in the PETSc-FUN3D papers circa 1999 and has
> been confirmed multiple times over the years by various members of this
> list (including me). I believe FEniCS and libMesh now do this by default
> (or at least have an option) and it was shown to perform better. It's a
> notable weakness of DMPlex that it does not apply such an ordering of dofs
> and I've complained to Matt about it many times over the years, but any
> blame rests solely with me for not carving out time to implement it here.
>

Jesus. Of course Plex can do this. It is the default for PyLith. Less
complaining, more looking.

  Matt

> Better SGS/SOR smoothing factors with simple OpenMP partitioning is an
> additional bonus, though I'm not a fan of using OpenMP in this way.
>
> Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> writes:
>
> > Hi,
> >
> > For the knowledge of readers, I just read section 7.3 here:
> >
> >
> https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing
> >
> > And it is explained why multi-threading gives a poor result with the
> > Hybrid−SGS smoother...
> >
> > Eric
> >
> >
> > On 2021-03-15 2:50 p.m., Barry Smith wrote:
> >>
> >>    I posted some information at the issue.
> >>
> >>    IMHO it is likely a bug in one or more of hypre's smoothers that
> >> use OpenMP. We have never tested them before (and likely hypre has not
> >> tested all the combinations) and so would not have seen the bug.
> >> Hopefully they can just fix it.
> >>
> >>    Barry
> >>
> >>     I got the problem to occur with ex56 with 2 MPI ranks and 4 OpenMP
> >> threads, if I used less than 4 threads it did not generate an
> >> indefinite preconditioner.
> >>
> >>
> >>> On Mar 14, 2021, at 1:18 PM, Eric Chamberland
> >>> <Eric.Chamberland at giref.ulaval.ca
> >>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>
> >>> Done:
> >>>
> >>> https://github.com/hypre-space/hypre/issues/303
> >>>
> >>> Maybe I will need some help about PETSc to answer their questions...
> >>>
> >>> Eric
> >>>
> >>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
> >>>> Eric
> >>>>
> >>>> You should report these HYPRE issues upstream
> >>>> https://github.com/hypre-space/hypre/issues
> >>>> <https://github.com/hypre-space/hypre/issues>
> >>>>
> >>>>
> >>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland
> >>>>> <Eric.Chamberland at giref.ulaval.ca
> >>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>>>
> >>>>> For us it clearly creates problems in real computations...
> >>>>>
> >>>>> I understand the need to have clean test for PETSc, but for me, it
> >>>>> reveals that hypre isn't usable with more than one thread for now...
> >>>>>
> >>>>> Another solution:  force single-threaded configuration for hypre
> >>>>> until this is fixed?
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
> >>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
> >>>>>>   Linear solve did not converge due to DIVERGED_INDEFINITE_PC
> >>>>>> iterations 3
> >>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
> >>>>>> OK, independently of the architecture it seems (Eric Docker image
> >>>>>> with 1 or 2 threads or my macOS), but contraction factor is higher
> >>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 8
> >>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 24
> >>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 26
> >>>>>> v. currently
> >>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 7
> >>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 9
> >>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 10
> >>>>>>
> >>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make
> test?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Pierre
> >>>>>>
> >>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <mfadams at lbl.gov
> >>>>>>> <mailto:mfadams at lbl.gov>> wrote:
> >>>>>>>
> >>>>>>> Hypre uses a multiplicative smoother by default. It has a
> >>>>>>> chebyshev smoother. That with a Jacobi PC should be thread
> >>>>>>> invariant.
> >>>>>>> Mark
> >>>>>>>
> >>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <pierre at joliv.et
> >>>>>>> <mailto:pierre at joliv.et>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>>     On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <pierre at joliv.et
> >>>>>>>>     <mailto:pierre at joliv.et>> wrote:
> >>>>>>>>
> >>>>>>>>     Hello Eric,
> >>>>>>>>     I’ve made an “interesting” discovery, so I’ll put back the
> >>>>>>>>     list in c/c.
> >>>>>>>>     It appears the following snippet of code which uses
> >>>>>>>>     Allreduce() + lambda function + MPI_IN_PLACE is:
> >>>>>>>>     - Valgrind-clean with MPICH;
> >>>>>>>>     - Valgrind-clean with OpenMPI 4.0.5;
> >>>>>>>>     - not Valgrind-clean with OpenMPI 4.1.0.
> >>>>>>>>     I’m not sure who is to blame here, I’ll need to look at the
> >>>>>>>>     MPI specification for what is required by the implementors
> >>>>>>>>     and users in that case.
> >>>>>>>>
> >>>>>>>>     In the meantime, I’ll do the following:
> >>>>>>>>     - update config/BuildSystem/config/packages/OpenMPI.py to
> >>>>>>>>     use OpenMPI 4.1.0, see if any other error appears;
> >>>>>>>>     - provide a hotfix to bypass the segfaults;
> >>>>>>>
> >>>>>>>     I can confirm that splitting the single Allreduce with my own
> >>>>>>>     MPI_Op into two Allreduce with MAX and BAND fixes the
> >>>>>>>     segfaults with OpenMPI (*).
> >>>>>>>
> >>>>>>>>     - look at the hypre issue and whether they should be
> >>>>>>>>     deferred to the hypre team.
> >>>>>>>
> >>>>>>>     I don’t know if there is something wrong in hypre threading
> >>>>>>>     or if it’s just a side effect of threading, but it seems that
> >>>>>>>     the number of threads has a drastic effect on the quality of
> >>>>>>>     the PC.
> >>>>>>>     By default, it looks that there are two threads per process
> >>>>>>>     with your Docker image.
> >>>>>>>     If I force OMP_NUM_THREADS=1, then I get the same convergence
> >>>>>>>     as in the output file.
> >>>>>>>
> >>>>>>>     Thanks,
> >>>>>>>     Pierre
> >>>>>>>
> >>>>>>>     (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
> >>>>>>>     <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
> >>>>>>>
> >>>>>>>>     Thank you for the Docker files, they were really useful.
> >>>>>>>>     If you want to avoid oversubscription failures, you can edit
> >>>>>>>>     the file /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and
> >>>>>>>>     append the line:
> >>>>>>>>     localhost slots=12
> >>>>>>>>     If you want to increase the timeout limit of PETSc test
> >>>>>>>>     suite for each test, you can add the extra flag in your
> >>>>>>>>     command line TIMEOUT=180 (default is 60, units are seconds).
> >>>>>>>>
> >>>>>>>>     Thanks, I’ll ping you on GitLab when I’ve got something
> >>>>>>>>     ready for you to try,
> >>>>>>>>     Pierre
> >>>>>>>>
> >>>>>>>>     <ompi.cxx>
> >>>>>>>>
> >>>>>>>>>     On 12 Mar 2021, at 8:54 PM, Eric Chamberland
> >>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
> >>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>>>>>>>
> >>>>>>>>>     Hi Pierre,
> >>>>>>>>>
> >>>>>>>>>     I now have a docker container reproducing the problems here.
> >>>>>>>>>
> >>>>>>>>>     Actually, if I look at
> >>>>>>>>>     snes_tutorials-ex12_quad_singular_hpddm it fails like this:
> >>>>>>>>>
> >>>>>>>>>     not ok snes_tutorials-ex12_quad_singular_hpddm # Error code:
> 59
> >>>>>>>>>     # Initial guess
> >>>>>>>>>     #       L_2 Error: 0.00803099
> >>>>>>>>>     # Initial Residual
> >>>>>>>>>     #       L_2 Residual: 1.09057
> >>>>>>>>>     #       Au - b = Au + F(0)
> >>>>>>>>>     #       Linear L_2 Residual: 1.09057
> >>>>>>>>>     # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
> >>>>>>>>>     # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
> >>>>>>>>>     # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
> >>>>>>>>>     # [3]PETSC ERROR:
> >>>>>>>>>
>  ------------------------------------------------------------------------
> >>>>>>>>>     # [3]PETSC ERROR: Caught signal number 11 SEGV:
> >>>>>>>>>     Segmentation Violation, probably memory access out of range
> >>>>>>>>>     # [3]PETSC ERROR: Try option -start_in_debugger or
> >>>>>>>>>     -on_error_attach_debugger
> >>>>>>>>>     # [3]PETSC ERROR: or see
> >>>>>>>>>
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >>>>>>>>>     <
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
> >>>>>>>>>     # [3]PETSC ERROR: or try http://valgrind.org
> >>>>>>>>>     <http://valgrind.org/> on GNU/linux and Apple Mac OS X to
> >>>>>>>>>     find memory corruption errors
> >>>>>>>>>     # [3]PETSC ERROR: likely location of problem given in stack
> >>>>>>>>>     below
> >>>>>>>>>     # [3]PETSC ERROR: --------------------- Stack Frames
> >>>>>>>>>     ------------------------------------
> >>>>>>>>>     # [3]PETSC ERROR: Note: The EXACT line numbers in the stack
> >>>>>>>>>     are not available,
> >>>>>>>>>     # [3]PETSC ERROR: INSTEAD the line number of the start of
> >>>>>>>>>     the function
> >>>>>>>>>     # [3]PETSC ERROR: is given.
> >>>>>>>>>     # [3]PETSC ERROR: [3] buildTwo line 987
> >>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
> >>>>>>>>>     # [3]PETSC ERROR: [3] next line 1130
> >>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
> >>>>>>>>>     # [3]PETSC ERROR: --------------------- Error Message
> >>>>>>>>>
>  --------------------------------------------------------------
> >>>>>>>>>     # [3]PETSC ERROR: Signal received
> >>>>>>>>>     # [3]PETSC ERROR: [0]PETSC ERROR:
> >>>>>>>>>
>  ------------------------------------------------------------------------
> >>>>>>>>>
> >>>>>>>>>     also ex12_quad_hpddm_reuse_baij fails with a lot more "Read
> >>>>>>>>>     -1, expected ..." which I don't know where they come from...?
> >>>>>>>>>
> >>>>>>>>>     Hypre (like in diff-snes_tutorials-ex56_hypre) is also
> >>>>>>>>>     having DIVERGED_INDEFINITE_PC failures...
> >>>>>>>>>
> >>>>>>>>>     Please see the 3 attached docker files:
> >>>>>>>>>
> >>>>>>>>>     1) fedora_mkl_and_devtools : the DockerFile which install
> >>>>>>>>>     fedore 33 with gnu compilers and MKL and everything to
> develop.
> >>>>>>>>>
> >>>>>>>>>     2) openmpi: the DockerFile to bluid OpenMPI
> >>>>>>>>>
> >>>>>>>>>     3) petsc: The las DockerFile that build/install and test
> PETSc
> >>>>>>>>>
> >>>>>>>>>     I build the 3 like this:
> >>>>>>>>>
> >>>>>>>>>     docker build -t fedora_mkl_and_devtools -f
> >>>>>>>>>     fedora_mkl_and_devtools .
> >>>>>>>>>
> >>>>>>>>>     docker build -t openmpi -f openmpi .
> >>>>>>>>>
> >>>>>>>>>     docker build -t petsc -f petsc .
> >>>>>>>>>
> >>>>>>>>>     Disclaimer: I am not a docker expert, so I may do things
> >>>>>>>>>     that are not docker-stat-of-the-art but I am opened to
> >>>>>>>>>     suggestions... ;)
> >>>>>>>>>
> >>>>>>>>>     I have just ran it on my portable (long) which have not
> >>>>>>>>>     enough cores, so many more tests failed (should force
> >>>>>>>>>     --oversubscribe but don't know how to).  I will relaunch on
> >>>>>>>>>     my workstation in a few minutes.
> >>>>>>>>>
> >>>>>>>>>     I will now test your branch! (sorry for the delay).
> >>>>>>>>>
> >>>>>>>>>     Thanks,
> >>>>>>>>>
> >>>>>>>>>     Eric
> >>>>>>>>>
> >>>>>>>>>     On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
> >>>>>>>>>>
> >>>>>>>>>>     Hi Pierre,
> >>>>>>>>>>
> >>>>>>>>>>     ok, that's interesting!
> >>>>>>>>>>
> >>>>>>>>>>     I will try to build a docker image until tomorrow and give
> >>>>>>>>>>     you the exact recipe to reproduce the bugs.
> >>>>>>>>>>
> >>>>>>>>>>     Eric
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>     On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>     On 11 Mar 2021, at 6:16 AM, Barry Smith
> >>>>>>>>>>>>     <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>       Eric,
> >>>>>>>>>>>>
> >>>>>>>>>>>>      Sorry about not being more immediate. We still have
> >>>>>>>>>>>>     this in our active email so you don't need to submit
> >>>>>>>>>>>>     individual issues. We'll try to get to them as soon as
> >>>>>>>>>>>>     we can.
> >>>>>>>>>>>
> >>>>>>>>>>>     Indeed, I’m still trying to figure this out.
> >>>>>>>>>>>     I realized that some of my configure flags were different
> >>>>>>>>>>>     than yours, e.g., no --with-memalign.
> >>>>>>>>>>>     I’ve also added SuperLU_DIST to my installation.
> >>>>>>>>>>>     Still, I can’t reproduce any issue.
> >>>>>>>>>>>     I will continue looking into this, it appears I’m seeing
> >>>>>>>>>>>     some valgrind errors, but I don’t know if this is some
> >>>>>>>>>>>     side effect of OpenMPI not being valgrind-clean (last
> >>>>>>>>>>>     time I checked, there was no error with MPICH).
> >>>>>>>>>>>
> >>>>>>>>>>>     Thank you for your patience,
> >>>>>>>>>>>     Pierre
> >>>>>>>>>>>
> >>>>>>>>>>>     /usr/bin/gmake -f gmakefile test test-fail=1
> >>>>>>>>>>>     Using MAKEFLAGS: test-fail=1
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
> >>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_baij
> >>>>>>>>>>>      ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
> >>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist_2
> >>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist_2
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
> >>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
> >>>>>>>>>>>      ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex50_tut_2
> >>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex50_tut_2
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
> >>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist
> >>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
> >>>>>>>>>>>      ok snes_tutorials-ex56_hypre
> >>>>>>>>>>>      ok diff-snes_tutorials-ex56_hypre
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex56_2
> >>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex56_2
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
> >>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_elas
> >>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_elas
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
> >>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
> >>>>>>>>>>>      ok
> diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
> >>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
> >>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
> >>>>>>>>>>>     processors requested than permitted
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
> >>>>>>>>>>>     failed so no diff
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
> >>>>>>>>>>>     required for this test
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
> >>>>>>>>>>>      ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
> >>>>>>>>>>>      ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
> >>>>>>>>>>>      ok snes_tutorials-ex19_tut_3
> >>>>>>>>>>>      ok diff-snes_tutorials-ex19_tut_3
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
> >>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_vlap
> >>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran
> >>>>>>>>>>>     required for this test
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
> >>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist
> >>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
> >>>>>>>>>>>      ok
> >>>>>>>>>>>
>  snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
> >>>>>>>>>>>      ok
> >>>>>>>>>>>
>  diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex49_hypre_nullspace
> >>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
> >>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist_2
> >>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist_2
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
> >>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
> >>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
> >>>>>>>>>>>     processors requested than permitted
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
> >>>>>>>>>>>     failed so no diff
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
> >>>>>>>>>>>      ok
> >>>>>>>>>>>
>  snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
> >>>>>>>>>>>      ok
> >>>>>>>>>>>
>  diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex64_1
> >>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex64_1
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
> >>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
> >>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
> >>>>>>>>>>>     processors requested than permitted
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
> >>>>>>>>>>>     failed so no diff
> >>>>>>>>>>>           TEST
> >>>>>>>>>>>
>  arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
> >>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran
> >>>>>>>>>>>     required for this test
> >>>>>>>>>>>
> >>>>>>>>>>>>      Barry
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>     On Mar 10, 2021, at 11:03 PM, Eric Chamberland
> >>>>>>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
> >>>>>>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     Barry,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     to get a some follow up on --with-openmp=1 failures,
> >>>>>>>>>>>>>     shall I open gitlab issues for:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     a) all hypre failures giving DIVERGED_INDEFINITE_PC
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     b) all superlu_dist failures giving different results
> >>>>>>>>>>>>>     with initia and "Exceeded timeout limit of 60 s"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     c) hpddm failures "free(): invalid next size (fast)"
> >>>>>>>>>>>>>     and "Segmentation Violation"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     d) all tao's "Exceeded timeout limit of 60 s"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     I don't see how I could do all these debugging by
> myself...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     Thanks,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     Eric
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>     --
> >>>>>>>>>>     Eric Chamberland, ing., M. Ing
> >>>>>>>>>>     Professionnel de recherche
> >>>>>>>>>>     GIREF/Université Laval
> >>>>>>>>>>     (418) 656-2131 poste 41 22 42
> >>>>>>>>>     --
> >>>>>>>>>     Eric Chamberland, ing., M. Ing
> >>>>>>>>>     Professionnel de recherche
> >>>>>>>>>     GIREF/Université Laval
> >>>>>>>>>     (418) 656-2131 poste 41 22 42
> >>>>>>>>>     <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> --
> >>>>> Eric Chamberland, ing., M. Ing
> >>>>> Professionnel de recherche
> >>>>> GIREF/Université Laval
> >>>>> (418) 656-2131 poste 41 22 42
> >>>>
> >>> --
> >>> Eric Chamberland, ing., M. Ing
> >>> Professionnel de recherche
> >>> GIREF/Université Laval
> >>> (418) 656-2131 poste 41 22 42
> >>
> > --
> > Eric Chamberland, ing., M. Ing
> > Professionnel de recherche
> > GIREF/Université Laval
> > (418) 656-2131 poste 41 22 42
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210319/a478413b/attachment-0001.html>