[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Mon Mar 22 14:07:51 CDT 2021


I added some information here:

https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719

Maybe someone can say more than I on what PETSc tries to do with the 2 
mentioned tutorials that are timing out...

Thanks,

Eric


On 2021-03-15 11:31 a.m., Eric Chamberland wrote:
>
> Reported timeout bugs to SuperLU_dist too:
>
> https://github.com/xiaoyeli/superlu_dist/issues/69
>
> Eric
>
>
> On 2021-03-14 2:18 p.m., Eric Chamberland wrote:
>>
>> Done:
>>
>> https://github.com/hypre-space/hypre/issues/303
>>
>> Maybe I will need some help about PETSc to answer their questions...
>>
>> Eric
>>
>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
>>> Eric
>>>
>>> You should report these HYPRE issues upstream 
>>> https://github.com/hypre-space/hypre/issues 
>>> <https://github.com/hypre-space/hypre/issues>
>>>
>>>
>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland 
>>>> <Eric.Chamberland at giref.ulaval.ca 
>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>
>>>> For us it clearly creates problems in real computations...
>>>>
>>>> I understand the need to have clean test for PETSc, but for me, it 
>>>> reveals that hypre isn't usable with more than one thread for now...
>>>>
>>>> Another solution:  force single-threaded configuration for hypre 
>>>> until this is fixed?
>>>>
>>>> Eric
>>>>
>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
>>>>>   Linear solve did not converge due to DIVERGED_INDEFINITE_PC 
>>>>> iterations 3
>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
>>>>> OK, independently of the architecture it seems (Eric Docker image 
>>>>> with 1 or 2 threads or my macOS), but contraction factor is higher
>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 8
>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 24
>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 26
>>>>> v. currently
>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 7
>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 9
>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 10
>>>>>
>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make test?
>>>>>
>>>>> Thanks,
>>>>> Pierre
>>>>>
>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <mfadams at lbl.gov 
>>>>>> <mailto:mfadams at lbl.gov>> wrote:
>>>>>>
>>>>>> Hypre uses a multiplicative smoother by default. It has a 
>>>>>> chebyshev smoother. That with a Jacobi PC should be thread 
>>>>>> invariant.
>>>>>> Mark
>>>>>>
>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <pierre at joliv.et 
>>>>>> <mailto:pierre at joliv.et>> wrote:
>>>>>>
>>>>>>
>>>>>>>     On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <pierre at joliv.et
>>>>>>>     <mailto:pierre at joliv.et>> wrote:
>>>>>>>
>>>>>>>     Hello Eric,
>>>>>>>     I’ve made an “interesting” discovery, so I’ll put back the
>>>>>>>     list in c/c.
>>>>>>>     It appears the following snippet of code which uses
>>>>>>>     Allreduce() + lambda function + MPI_IN_PLACE is:
>>>>>>>     - Valgrind-clean with MPICH;
>>>>>>>     - Valgrind-clean with OpenMPI 4.0.5;
>>>>>>>     - not Valgrind-clean with OpenMPI 4.1.0.
>>>>>>>     I’m not sure who is to blame here, I’ll need to look at the
>>>>>>>     MPI specification for what is required by the implementors
>>>>>>>     and users in that case.
>>>>>>>
>>>>>>>     In the meantime, I’ll do the following:
>>>>>>>     - update config/BuildSystem/config/packages/OpenMPI.py to
>>>>>>>     use OpenMPI 4.1.0, see if any other error appears;
>>>>>>>     - provide a hotfix to bypass the segfaults;
>>>>>>
>>>>>>     I can confirm that splitting the single Allreduce with my own
>>>>>>     MPI_Op into two Allreduce with MAX and BAND fixes the
>>>>>>     segfaults with OpenMPI (*).
>>>>>>
>>>>>>>     - look at the hypre issue and whether they should be
>>>>>>>     deferred to the hypre team.
>>>>>>
>>>>>>     I don’t know if there is something wrong in hypre threading
>>>>>>     or if it’s just a side effect of threading, but it seems that
>>>>>>     the number of threads has a drastic effect on the quality of
>>>>>>     the PC.
>>>>>>     By default, it looks that there are two threads per process
>>>>>>     with your Docker image.
>>>>>>     If I force OMP_NUM_THREADS=1, then I get the same convergence
>>>>>>     as in the output file.
>>>>>>
>>>>>>     Thanks,
>>>>>>     Pierre
>>>>>>
>>>>>>     (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
>>>>>>     <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
>>>>>>
>>>>>>>     Thank you for the Docker files, they were really useful.
>>>>>>>     If you want to avoid oversubscription failures, you can edit
>>>>>>>     the file /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and
>>>>>>>     append the line:
>>>>>>>     localhost slots=12
>>>>>>>     If you want to increase the timeout limit of PETSc test
>>>>>>>     suite for each test, you can add the extra flag in your
>>>>>>>     command line TIMEOUT=180 (default is 60, units are seconds).
>>>>>>>
>>>>>>>     Thanks, I’ll ping you on GitLab when I’ve got something
>>>>>>>     ready for you to try,
>>>>>>>     Pierre
>>>>>>>
>>>>>>>     <ompi.cxx>
>>>>>>>
>>>>>>>>     On 12 Mar 2021, at 8:54 PM, Eric Chamberland
>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>
>>>>>>>>     Hi Pierre,
>>>>>>>>
>>>>>>>>     I now have a docker container reproducing the problems here.
>>>>>>>>
>>>>>>>>     Actually, if I look at
>>>>>>>>     snes_tutorials-ex12_quad_singular_hpddm it fails like this:
>>>>>>>>
>>>>>>>>     not ok snes_tutorials-ex12_quad_singular_hpddm # Error code: 59
>>>>>>>>     #       Initial guess
>>>>>>>>     #       L_2 Error: 0.00803099
>>>>>>>>     #       Initial Residual
>>>>>>>>     #       L_2 Residual: 1.09057
>>>>>>>>     #       Au - b = Au + F(0)
>>>>>>>>     #       Linear L_2 Residual: 1.09057
>>>>>>>>     # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
>>>>>>>>     # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
>>>>>>>>     # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
>>>>>>>>     #       [3]PETSC ERROR:
>>>>>>>>     ------------------------------------------------------------------------
>>>>>>>>     #       [3]PETSC ERROR: Caught signal number 11 SEGV:
>>>>>>>>     Segmentation Violation, probably memory access out of range
>>>>>>>>     #       [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>     -on_error_attach_debugger
>>>>>>>>     #       [3]PETSC ERROR: or see
>>>>>>>>     https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>>>>>     <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>>>>>>>     #       [3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>     <http://valgrind.org/> on GNU/linux and Apple Mac OS X to
>>>>>>>>     find memory corruption errors
>>>>>>>>     #       [3]PETSC ERROR: likely location of problem given in
>>>>>>>>     stack below
>>>>>>>>     #       [3]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>     ------------------------------------
>>>>>>>>     #       [3]PETSC ERROR: Note: The EXACT line numbers in the
>>>>>>>>     stack are not available,
>>>>>>>>     #       [3]PETSC ERROR: INSTEAD the line number of the
>>>>>>>>     start of the function
>>>>>>>>     #       [3]PETSC ERROR:       is given.
>>>>>>>>     #       [3]PETSC ERROR: [3] buildTwo line 987
>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>     #       [3]PETSC ERROR: [3] next line 1130
>>>>>>>>     /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>     #       [3]PETSC ERROR: --------------------- Error Message
>>>>>>>>     --------------------------------------------------------------
>>>>>>>>     #       [3]PETSC ERROR: Signal received
>>>>>>>>     #       [3]PETSC ERROR: [0]PETSC ERROR:
>>>>>>>>     ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>     also ex12_quad_hpddm_reuse_baij fails with a lot more "Read
>>>>>>>>     -1, expected ..." which I don't know where they come from...?
>>>>>>>>
>>>>>>>>     Hypre (like in diff-snes_tutorials-ex56_hypre) is also
>>>>>>>>     having DIVERGED_INDEFINITE_PC failures...
>>>>>>>>
>>>>>>>>     Please see the 3 attached docker files:
>>>>>>>>
>>>>>>>>     1) fedora_mkl_and_devtools : the DockerFile which install
>>>>>>>>     fedore 33 with gnu compilers and MKL and everything to develop.
>>>>>>>>
>>>>>>>>     2) openmpi: the DockerFile to bluid OpenMPI
>>>>>>>>
>>>>>>>>     3) petsc: The las DockerFile that build/install and test PETSc
>>>>>>>>
>>>>>>>>     I build the 3 like this:
>>>>>>>>
>>>>>>>>     docker build -t fedora_mkl_and_devtools -f
>>>>>>>>     fedora_mkl_and_devtools .
>>>>>>>>
>>>>>>>>     docker build -t openmpi -f openmpi .
>>>>>>>>
>>>>>>>>     docker build -t petsc -f petsc .
>>>>>>>>
>>>>>>>>     Disclaimer: I am not a docker expert, so I may do things
>>>>>>>>     that are not docker-stat-of-the-art but I am opened to
>>>>>>>>     suggestions... ;)
>>>>>>>>
>>>>>>>>     I have just ran it on my portable (long) which have not
>>>>>>>>     enough cores, so many more tests failed (should force
>>>>>>>>     --oversubscribe but don't know how to).  I will relaunch on
>>>>>>>>     my workstation in a few minutes.
>>>>>>>>
>>>>>>>>     I will now test your branch! (sorry for the delay).
>>>>>>>>
>>>>>>>>     Thanks,
>>>>>>>>
>>>>>>>>     Eric
>>>>>>>>
>>>>>>>>     On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
>>>>>>>>>
>>>>>>>>>     Hi Pierre,
>>>>>>>>>
>>>>>>>>>     ok, that's interesting!
>>>>>>>>>
>>>>>>>>>     I will try to build a docker image until tomorrow and give
>>>>>>>>>     you the exact recipe to reproduce the bugs.
>>>>>>>>>
>>>>>>>>>     Eric
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>     On 11 Mar 2021, at 6:16 AM, Barry Smith
>>>>>>>>>>>     <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>       Eric,
>>>>>>>>>>>
>>>>>>>>>>>      Sorry about not being more immediate. We still have
>>>>>>>>>>>     this in our active email so you don't need to submit
>>>>>>>>>>>     individual issues. We'll try to get to them as soon as
>>>>>>>>>>>     we can.
>>>>>>>>>>
>>>>>>>>>>     Indeed, I’m still trying to figure this out.
>>>>>>>>>>     I realized that some of my configure flags were different
>>>>>>>>>>     than yours, e.g., no --with-memalign.
>>>>>>>>>>     I’ve also added SuperLU_DIST to my installation.
>>>>>>>>>>     Still, I can’t reproduce any issue.
>>>>>>>>>>     I will continue looking into this, it appears I’m seeing
>>>>>>>>>>     some valgrind errors, but I don’t know if this is some
>>>>>>>>>>     side effect of OpenMPI not being valgrind-clean (last
>>>>>>>>>>     time I checked, there was no error with MPICH).
>>>>>>>>>>
>>>>>>>>>>     Thank you for your patience,
>>>>>>>>>>     Pierre
>>>>>>>>>>
>>>>>>>>>>     /usr/bin/gmake -f gmakefile test test-fail=1
>>>>>>>>>>     Using MAKEFLAGS: test-fail=1
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>      ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>      ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>>>>>>>>>>      ok ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>      ok diff-ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>>>>>>>>      ok snes_tutorials-ex56_hypre
>>>>>>>>>>      ok diff-snes_tutorials-ex56_hypre
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex56_2
>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex56_2
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>>>>>>>>      ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>      ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>     processors requested than permitted
>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
>>>>>>>>>>     failed so no diff
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
>>>>>>>>>>     required for this test
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>>>>>>>>      ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>      ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>>>>>>>>      ok snes_tutorials-ex19_tut_3
>>>>>>>>>>      ok diff-snes_tutorials-ex19_tut_3
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>>>>>>>>      ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>      ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran
>>>>>>>>>>     required for this test
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist
>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>>>>>>>>      ok
>>>>>>>>>>     snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>      ok
>>>>>>>>>>     diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>>>>>>>>>>      ok snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>      ok diff-snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>     processors requested than permitted
>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
>>>>>>>>>>     failed so no diff
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>>>>>>>>      ok
>>>>>>>>>>     snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>      ok
>>>>>>>>>>     diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex64_1
>>>>>>>>>>      ok diff-ksp_ksp_tutorials-ex64_1
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
>>>>>>>>>>     not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
>>>>>>>>>>     #srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>     processors requested than permitted
>>>>>>>>>>      ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
>>>>>>>>>>     failed so no diff
>>>>>>>>>>           TEST
>>>>>>>>>>     arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>>>>>>>>>>      ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran
>>>>>>>>>>     required for this test
>>>>>>>>>>
>>>>>>>>>>>      Barry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>     On Mar 10, 2021, at 11:03 PM, Eric Chamberland
>>>>>>>>>>>>     <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>>>>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>     Barry,
>>>>>>>>>>>>
>>>>>>>>>>>>     to get a some follow up on --with-openmp=1 failures,
>>>>>>>>>>>>     shall I open gitlab issues for:
>>>>>>>>>>>>
>>>>>>>>>>>>     a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>>>>>>>>>>
>>>>>>>>>>>>     b) all superlu_dist failures giving different results
>>>>>>>>>>>>     with initia and "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>
>>>>>>>>>>>>     c) hpddm failures "free(): invalid next size (fast)"
>>>>>>>>>>>>     and "Segmentation Violation"
>>>>>>>>>>>>
>>>>>>>>>>>>     d) all tao's "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>
>>>>>>>>>>>>     I don't see how I could do all these debugging by myself...
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>     Eric
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>     -- 
>>>>>>>>>     Eric Chamberland, ing., M. Ing
>>>>>>>>>     Professionnel de recherche
>>>>>>>>>     GIREF/Université Laval
>>>>>>>>>     (418) 656-2131 poste 41 22 42
>>>>>>>>     -- 
>>>>>>>>     Eric Chamberland, ing., M. Ing
>>>>>>>>     Professionnel de recherche
>>>>>>>>     GIREF/Université Laval
>>>>>>>>     (418) 656-2131 poste 41 22 42
>>>>>>>>     <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
>>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>> Eric Chamberland, ing., M. Ing
>>>> Professionnel de recherche
>>>> GIREF/Université Laval
>>>> (418) 656-2131 poste 41 22 42
>>>
>> -- 
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Université Laval
>> (418) 656-2131 poste 41 22 42
> -- 
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42

-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210322/fa677b0f/attachment-0001.html>


More information about the petsc-dev mailing list