[petsc-dev] Petsc "make test" have more failures for --with-openmp=1
Matthew Knepley
knepley at gmail.com
Fri Mar 19 07:44:20 CDT 2021
On Thu, Mar 18, 2021 at 11:51 PM Jed Brown <jed at jedbrown.org> wrote:
> Note that this is specific to the node numbering, and that node numbering
> tends to produce poor results even for MatMult due to poor cache reuse of
> the vector. It's good practice after partitioning to use a
> locality-preserving ordering of dofs on a process (e.g., RCM if you use
> MatOrdering). This was shown in the PETSc-FUN3D papers circa 1999 and has
> been confirmed multiple times over the years by various members of this
> list (including me). I believe FEniCS and libMesh now do this by default
> (or at least have an option) and it was shown to perform better. It's a
> notable weakness of DMPlex that it does not apply such an ordering of dofs
> and I've complained to Matt about it many times over the years, but any
> blame rests solely with me for not carving out time to implement it here.
>
Jesus. Of course Plex can do this. It is the default for PyLith. Less
complaining, more looking.
Matt
> Better SGS/SOR smoothing factors with simple OpenMP partitioning is an
> additional bonus, though I'm not a fan of using OpenMP in this way.
>
> Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> writes:
>
> > Hi,
> >
> > For the knowledge of readers, I just read section 7.3 here:
> >
> >
> https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing
> >
> > And it is explained why multi-threading gives a poor result with the
> > Hybrid−SGS smoother...
> >
> > Eric
> >
> >
> > On 2021-03-15 2:50 p.m., Barry Smith wrote:
> >>
> >> I posted some information at the issue.
> >>
> >> IMHO it is likely a bug in one or more of hypre's smoothers that
> >> use OpenMP. We have never tested them before (and likely hypre has not
> >> tested all the combinations) and so would not have seen the bug.
> >> Hopefully they can just fix it.
> >>
> >> Barry
> >>
> >> I got the problem to occur with ex56 with 2 MPI ranks and 4 OpenMP
> >> threads, if I used less than 4 threads it did not generate an
> >> indefinite preconditioner.
> >>
> >>
> >>> On Mar 14, 2021, at 1:18 PM, Eric Chamberland
> >>> <Eric.Chamberland at giref.ulaval.ca
> >>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>
> >>> Done:
> >>>
> >>> https://github.com/hypre-space/hypre/issues/303
> >>>
> >>> Maybe I will need some help about PETSc to answer their questions...
> >>>
> >>> Eric
> >>>
> >>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
> >>>> Eric
> >>>>
> >>>> You should report these HYPRE issues upstream
> >>>> https://github.com/hypre-space/hypre/issues
> >>>> <https://github.com/hypre-space/hypre/issues>
> >>>>
> >>>>
> >>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland
> >>>>> <Eric.Chamberland at giref.ulaval.ca
> >>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>>>
> >>>>> For us it clearly creates problems in real computations...
> >>>>>
> >>>>> I understand the need to have clean test for PETSc, but for me, it
> >>>>> reveals that hypre isn't usable with more than one thread for now...
> >>>>>
> >>>>> Another solution: force single-threaded configuration for hypre
> >>>>> until this is fixed?
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
> >>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
> >>>>>> Linear solve did not converge due to DIVERGED_INDEFINITE_PC
> >>>>>> iterations 3
> >>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
> >>>>>> OK, independently of the architecture it seems (Eric Docker image
> >>>>>> with 1 or 2 threads or my macOS), but contraction factor is higher
> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 8
> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 24
> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 26
> >>>>>> v. currently
> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 7
> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 9
> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 10
> >>>>>>
> >>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make
> test?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Pierre
> >>>>>>
> >>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <mfadams at lbl.gov
> >>>>>>> <mailto:mfadams at lbl.gov>> wrote:
> >>>>>>>
> >>>>>>> Hypre uses a multiplicative smoother by default. It has a
> >>>>>>> chebyshev smoother. That with a Jacobi PC should be thread
> >>>>>>> invariant.
> >>>>>>> Mark
> >>>>>>>
> >>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <pierre at joliv.et
> >>>>>>> <mailto:pierre at joliv.et>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <pierre at joliv.et
> >>>>>>>> <mailto:pierre at joliv.et>> wrote:
> >>>>>>>>
> >>>>>>>> Hello Eric,
> >>>>>>>> I’ve made an “interesting” discovery, so I’ll put back the
> >>>>>>>> list in c/c.
> >>>>>>>> It appears the following snippet of code which uses
> >>>>>>>> Allreduce() + lambda function + MPI_IN_PLACE is:
> >>>>>>>> - Valgrind-clean with MPICH;
> >>>>>>>> - Valgrind-clean with OpenMPI 4.0.5;
> >>>>>>>> - not Valgrind-clean with OpenMPI 4.1.0.
> >>>>>>>> I’m not sure who is to blame here, I’ll need to look at the
> >>>>>>>> MPI specification for what is required by the implementors
> >>>>>>>> and users in that case.
> >>>>>>>>
> >>>>>>>> In the meantime, I’ll do the following:
> >>>>>>>> - update config/BuildSystem/config/packages/OpenMPI.py to
> >>>>>>>> use OpenMPI 4.1.0, see if any other error appears;
> >>>>>>>> - provide a hotfix to bypass the segfaults;
> >>>>>>>
> >>>>>>> I can confirm that splitting the single Allreduce with my own
> >>>>>>> MPI_Op into two Allreduce with MAX and BAND fixes the
> >>>>>>> segfaults with OpenMPI (*).
> >>>>>>>
> >>>>>>>> - look at the hypre issue and whether they should be
> >>>>>>>> deferred to the hypre team.
> >>>>>>>
> >>>>>>> I don’t know if there is something wrong in hypre threading
> >>>>>>> or if it’s just a side effect of threading, but it seems that
> >>>>>>> the number of threads has a drastic effect on the quality of
> >>>>>>> the PC.
> >>>>>>> By default, it looks that there are two threads per process
> >>>>>>> with your Docker image.
> >>>>>>> If I force OMP_NUM_THREADS=1, then I get the same convergence
> >>>>>>> as in the output file.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Pierre
> >>>>>>>
> >>>>>>> (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
> >>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
> >>>>>>>
> >>>>>>>> Thank you for the Docker files, they were really useful.
> >>>>>>>> If you want to avoid oversubscription failures, you can edit
> >>>>>>>> the file /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and
> >>>>>>>> append the line:
> >>>>>>>> localhost slots=12
> >>>>>>>> If you want to increase the timeout limit of PETSc test
> >>>>>>>> suite for each test, you can add the extra flag in your
> >>>>>>>> command line TIMEOUT=180 (default is 60, units are seconds).
> >>>>>>>>
> >>>>>>>> Thanks, I’ll ping you on GitLab when I’ve got something
> >>>>>>>> ready for you to try,
> >>>>>>>> Pierre
> >>>>>>>>
> >>>>>>>> <ompi.cxx>
> >>>>>>>>
> >>>>>>>>> On 12 Mar 2021, at 8:54 PM, Eric Chamberland
> >>>>>>>>> <Eric.Chamberland at giref.ulaval.ca
> >>>>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Pierre,
> >>>>>>>>>
> >>>>>>>>> I now have a docker container reproducing the problems here.
> >>>>>>>>>
> >>>>>>>>> Actually, if I look at
> >>>>>>>>> snes_tutorials-ex12_quad_singular_hpddm it fails like this:
> >>>>>>>>>
> >>>>>>>>> not ok snes_tutorials-ex12_quad_singular_hpddm # Error code:
> 59
> >>>>>>>>> # Initial guess
> >>>>>>>>> # L_2 Error: 0.00803099
> >>>>>>>>> # Initial Residual
> >>>>>>>>> # L_2 Residual: 1.09057
> >>>>>>>>> # Au - b = Au + F(0)
> >>>>>>>>> # Linear L_2 Residual: 1.09057
> >>>>>>>>> # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
> >>>>>>>>> # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
> >>>>>>>>> # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
> >>>>>>>>> # [3]PETSC ERROR:
> >>>>>>>>>
> ------------------------------------------------------------------------
> >>>>>>>>> # [3]PETSC ERROR: Caught signal number 11 SEGV:
> >>>>>>>>> Segmentation Violation, probably memory access out of range
> >>>>>>>>> # [3]PETSC ERROR: Try option -start_in_debugger or
> >>>>>>>>> -on_error_attach_debugger
> >>>>>>>>> # [3]PETSC ERROR: or see
> >>>>>>>>>
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >>>>>>>>> <
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
> >>>>>>>>> # [3]PETSC ERROR: or try http://valgrind.org
> >>>>>>>>> <http://valgrind.org/> on GNU/linux and Apple Mac OS X to
> >>>>>>>>> find memory corruption errors
> >>>>>>>>> # [3]PETSC ERROR: likely location of problem given in stack
> >>>>>>>>> below
> >>>>>>>>> # [3]PETSC ERROR: --------------------- Stack Frames
> >>>>>>>>> ------------------------------------
> >>>>>>>>> # [3]PETSC ERROR: Note: The EXACT line numbers in the stack
> >>>>>>>>> are not available,
> >>>>>>>>> # [3]PETSC ERROR: INSTEAD the line number of the start of
> >>>>>>>>> the function
> >>>>>>>>> # [3]PETSC ERROR: is given.
> >>>>>>>>> # [3]PETSC ERROR: [3] buildTwo line 987
> >>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
> >>>>>>>>> # [3]PETSC ERROR: [3] next line 1130
> >>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
> >>>>>>>>> # [3]PETSC ERROR: --------------------- Error Message
> >>>>>>>>>
> --------------------------------------------------------------
> >>>>>>>>> # [3]PETSC ERROR: Signal received
> >>>>>>>>> # [3]PETSC ERROR: [0]PETSC ERROR:
> >>>>>>>>>
> ------------------------------------------------------------------------
> >>>>>>>>>
> >>>>>>>>> also ex12_quad_hpddm_reuse_baij fails with a lot more "Read
> >>>>>>>>> -1, expected ..." which I don't know where they come from...?
> >>>>>>>>>
> >>>>>>>>> Hypre (like in diff-snes_tutorials-ex56_hypre) is also
> >>>>>>>>> having DIVERGED_INDEFINITE_PC failures...
> >>>>>>>>>
> >>>>>>>>> Please see the 3 attached docker files:
> >>>>>>>>>
> >>>>>>>>> 1) fedora_mkl_and_devtools : the DockerFile which install
> >>>>>>>>> fedore 33 with gnu compilers and MKL and everything to
> develop.
> >>>>>>>>>
> >>>>>>>>> 2) openmpi: the DockerFile to bluid OpenMPI
> >>>>>>>>>
> >>>>>>>>> 3) petsc: The las DockerFile that build/install and test
> PETSc
> >>>>>>>>>
> >>>>>>>>> I build the 3 like this:
> >>>>>>>>>
> >>>>>>>>> docker build -t fedora_mkl_and_devtools -f
> >>>>>>>>> fedora_mkl_and_devtools .
> >>>>>>>>>
> >>>>>>>>> docker build -t openmpi -f openmpi .
> >>>>>>>>>
> >>>>>>>>> docker build -t petsc -f petsc .
> >>>>>>>>>
> >>>>>>>>> Disclaimer: I am not a docker expert, so I may do things
> >>>>>>>>> that are not docker-stat-of-the-art but I am opened to
> >>>>>>>>> suggestions... ;)
> >>>>>>>>>
> >>>>>>>>> I have just ran it on my portable (long) which have not
> >>>>>>>>> enough cores, so many more tests failed (should force
> >>>>>>>>> --oversubscribe but don't know how to). I will relaunch on
> >>>>>>>>> my workstation in a few minutes.
> >>>>>>>>>
> >>>>>>>>> I will now test your branch! (sorry for the delay).
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Eric
> >>>>>>>>>
> >>>>>>>>> On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Pierre,
> >>>>>>>>>>
> >>>>>>>>>> ok, that's interesting!
> >>>>>>>>>>
> >>>>>>>>>> I will try to build a docker image until tomorrow and give
> >>>>>>>>>> you the exact recipe to reproduce the bugs.
> >>>>>>>>>>
> >>>>>>>>>> Eric
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 11 Mar 2021, at 6:16 AM, Barry Smith
> >>>>>>>>>>>> <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Eric,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sorry about not being more immediate. We still have
> >>>>>>>>>>>> this in our active email so you don't need to submit
> >>>>>>>>>>>> individual issues. We'll try to get to them as soon as
> >>>>>>>>>>>> we can.
> >>>>>>>>>>>
> >>>>>>>>>>> Indeed, I’m still trying to figure this out.
> >>>>>>>>>>> I realized that some of my configure flags were different
> >>>>>>>>>>> than yours, e.g., no --with-memalign.
> >>>>>>>>>>> I’ve also added SuperLU_DIST to my installation.
> >>>>>>>>>>> Still, I can’t reproduce any issue.
> >>>>>>>>>>> I will continue looking into this, it appears I’m seeing
> >>>>>>>>>>> some valgrind errors, but I don’t know if this is some
> >>>>>>>>>>> side effect of OpenMPI not being valgrind-clean (last
> >>>>>>>>>>> time I checked, there was no error with MPICH).
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you for your patience,
> >>>>>>>>>>> Pierre
> >>>>>>>>>>>
> >>>>>>>>>>> /usr/bin/gmake -f gmakefile test test-fail=1
> >>>>>>>>>>> Using MAKEFLAGS: test-fail=1
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
> >>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_baij
> >>>>>>>>>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
> >>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist_2
> >>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist_2
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
> >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
> >>>>>>>>>>> ok
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex50_tut_2
> >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex50_tut_2
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
> >>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist
> >>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
> >>>>>>>>>>> ok snes_tutorials-ex56_hypre
> >>>>>>>>>>> ok diff-snes_tutorials-ex56_hypre
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex56_2
> >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex56_2
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
> >>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_elas
> >>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_elas
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
> >>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
> >>>>>>>>>>> ok
> diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
> >>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
> >>>>>>>>>>> #srun: error: Unable to create step for job 1426755: More
> >>>>>>>>>>> processors requested than permitted
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
> >>>>>>>>>>> failed so no diff
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
> >>>>>>>>>>> required for this test
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
> >>>>>>>>>>> ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
> >>>>>>>>>>> ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
> >>>>>>>>>>> ok snes_tutorials-ex19_tut_3
> >>>>>>>>>>> ok diff-snes_tutorials-ex19_tut_3
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
> >>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_vlap
> >>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran
> >>>>>>>>>>> required for this test
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
> >>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist
> >>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
> >>>>>>>>>>> ok
> >>>>>>>>>>>
> snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
> >>>>>>>>>>> ok
> >>>>>>>>>>>
> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex49_hypre_nullspace
> >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
> >>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist_2
> >>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist_2
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
> >>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
> >>>>>>>>>>> #srun: error: Unable to create step for job 1426755: More
> >>>>>>>>>>> processors requested than permitted
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
> >>>>>>>>>>> failed so no diff
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
> >>>>>>>>>>> ok
> >>>>>>>>>>>
> snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
> >>>>>>>>>>> ok
> >>>>>>>>>>>
> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex64_1
> >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex64_1
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
> >>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
> >>>>>>>>>>> #srun: error: Unable to create step for job 1426755: More
> >>>>>>>>>>> processors requested than permitted
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
> >>>>>>>>>>> failed so no diff
> >>>>>>>>>>> TEST
> >>>>>>>>>>>
> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
> >>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran
> >>>>>>>>>>> required for this test
> >>>>>>>>>>>
> >>>>>>>>>>>> Barry
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland
> >>>>>>>>>>>>> <Eric.Chamberland at giref.ulaval.ca
> >>>>>>>>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Barry,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> to get a some follow up on --with-openmp=1 failures,
> >>>>>>>>>>>>> shall I open gitlab issues for:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> b) all superlu_dist failures giving different results
> >>>>>>>>>>>>> with initia and "Exceeded timeout limit of 60 s"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> c) hpddm failures "free(): invalid next size (fast)"
> >>>>>>>>>>>>> and "Segmentation Violation"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> d) all tao's "Exceeded timeout limit of 60 s"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't see how I could do all these debugging by
> myself...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Eric
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Eric Chamberland, ing., M. Ing
> >>>>>>>>>> Professionnel de recherche
> >>>>>>>>>> GIREF/Université Laval
> >>>>>>>>>> (418) 656-2131 poste 41 22 42
> >>>>>>>>> --
> >>>>>>>>> Eric Chamberland, ing., M. Ing
> >>>>>>>>> Professionnel de recherche
> >>>>>>>>> GIREF/Université Laval
> >>>>>>>>> (418) 656-2131 poste 41 22 42
> >>>>>>>>> <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> --
> >>>>> Eric Chamberland, ing., M. Ing
> >>>>> Professionnel de recherche
> >>>>> GIREF/Université Laval
> >>>>> (418) 656-2131 poste 41 22 42
> >>>>
> >>> --
> >>> Eric Chamberland, ing., M. Ing
> >>> Professionnel de recherche
> >>> GIREF/Université Laval
> >>> (418) 656-2131 poste 41 22 42
> >>
> > --
> > Eric Chamberland, ing., M. Ing
> > Professionnel de recherche
> > GIREF/Université Laval
> > (418) 656-2131 poste 41 22 42
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210319/a478413b/attachment-0001.html>
More information about the petsc-dev
mailing list