[petsc-dev] Petsc "make test" have more failures for --with-openmp=1
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Tue Mar 30 22:18:41 CDT 2021
Hi Barry,
Here is what I have:
1. The hpddm issues have been all solved (you can't see no more hpddm
failures here:
https://giref.ulaval.ca/~cmpgiref/petsc-main-debug/2021.03.29.02h00m02s_make_test.log)
2. For Hypre, I think it is indeed not a bug but a feature, as far as I
can see what has been told on the hypre discussion
list it is said "It still depends on the number of threads, that can’t
be avoided" (
https://github.com/hypre-space/hypre/issues/303#issuecomment-800442755 )
and here
https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing,
into section 7.3, we have some interesting informations, as:
Figure 7.6 clearly illustrates that convergence degrades with the
addition of threads for hybrid SGS;
....
The 3D sphere problem is the most extreme example because AMG-CG with
hybrid SGS no longer converges with the addition of threading.
but I might have misunderstood since I am not an expert for that...
3. For SuperLU_Dist, I have tried to build SuperLU_dist out of PETSc to
run the tests from superlu itself: sadly the bug is not showing up (see
https://github.com/xiaoyeli/superlu_dist/issues/69).
I would like to build a reproducer superlu_dist example from what is
done in the faulty test:
ksp_ksp_tutorials-ex5
that is buggy when called from PETSc: what bugs me, is that many other
PETSc tests are running fine with superlu_dist: maybe something is
uniquely done in ksp_ksp_tutorials-ex5 ?
So I think it worth digging into #3: the simple thing I have not yet
done is retreiving the stack when it fails (timeout).
And a question: when you state that you upgraded to OpenMPI 4.1 you mean
for one of your automated (docker?) compilation into the gitlab pipelines?
Thanks for taking news! :)
Eric
On 2021-03-30 1:47 p.m., Barry Smith wrote:
>
> Eric,
>
> How are things going on this OpenMP front? Any bug fixes from
> hypre or SuperLU_DIST?
>
> BTW: we have upgraded to OpenMPI 4.1 perhaps this resolves some
> issues?
>
> Barry
>
>
>> On Mar 22, 2021, at 2:07 PM, Eric Chamberland
>> <Eric.Chamberland at giref.ulaval.ca
>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>
>> I added some information here:
>>
>> https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719
>>
>> Maybe someone can say more than I on what PETSc tries to do with the
>> 2 mentioned tutorials that are timing out...
>>
>> Thanks,
>>
>> Eric
>>
>>
>> On 2021-03-15 11:31 a.m., Eric Chamberland wrote:
>>>
>>> Reported timeout bugs to SuperLU_dist too:
>>>
>>> https://github.com/xiaoyeli/superlu_dist/issues/69
>>>
>>> Eric
>>>
>>>
>>> On 2021-03-14 2:18 p.m., Eric Chamberland wrote:
>>>>
>>>> Done:
>>>>
>>>> https://github.com/hypre-space/hypre/issues/303
>>>>
>>>> Maybe I will need some help about PETSc to answer their questions...
>>>>
>>>> Eric
>>>>
>>>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
>>>>> Eric
>>>>>
>>>>> You should report these HYPRE issues upstream
>>>>> https://github.com/hypre-space/hypre/issues
>>>>> <https://github.com/hypre-space/hypre/issues>
>>>>>
>>>>>
>>>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland
>>>>>> <Eric.Chamberland at giref.ulaval.ca
>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>
>>>>>> For us it clearly creates problems in real computations...
>>>>>>
>>>>>> I understand the need to have clean test for PETSc, but for me,
>>>>>> it reveals that hypre isn't usable with more than one thread for
>>>>>> now...
>>>>>>
>>>>>> Another solution: force single-threaded configuration for hypre
>>>>>> until this is fixed?
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
>>>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
>>>>>>> Linear solve did not converge due to DIVERGED_INDEFINITE_PC
>>>>>>> iterations 3
>>>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
>>>>>>> OK, independently of the architecture it seems (Eric Docker
>>>>>>> image with 1 or 2 threads or my macOS), but contraction factor
>>>>>>> is higher
>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 8
>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 24
>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 26
>>>>>>> v. currently
>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 7
>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 9
>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 10
>>>>>>>
>>>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make
>>>>>>> test?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Pierre
>>>>>>>
>>>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <mfadams at lbl.gov
>>>>>>>> <mailto:mfadams at lbl.gov>> wrote:
>>>>>>>>
>>>>>>>> Hypre uses a multiplicative smoother by default. It has a
>>>>>>>> chebyshev smoother. That with a Jacobi PC should be thread
>>>>>>>> invariant.
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <pierre at joliv.et
>>>>>>>> <mailto:pierre at joliv.et>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 13 Mar 2021, at 9:17 AM, Pierre Jolivet
>>>>>>>>> <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>>>
>>>>>>>>> Hello Eric,
>>>>>>>>> I’ve made an “interesting” discovery, so I’ll put back the
>>>>>>>>> list in c/c.
>>>>>>>>> It appears the following snippet of code which uses
>>>>>>>>> Allreduce() + lambda function + MPI_IN_PLACE is:
>>>>>>>>> - Valgrind-clean with MPICH;
>>>>>>>>> - Valgrind-clean with OpenMPI 4.0.5;
>>>>>>>>> - not Valgrind-clean with OpenMPI 4.1.0.
>>>>>>>>> I’m not sure who is to blame here, I’ll need to look at
>>>>>>>>> the MPI specification for what is required by the
>>>>>>>>> implementors and users in that case.
>>>>>>>>>
>>>>>>>>> In the meantime, I’ll do the following:
>>>>>>>>> - update config/BuildSystem/config/packages/OpenMPI.py to
>>>>>>>>> use OpenMPI 4.1.0, see if any other error appears;
>>>>>>>>> - provide a hotfix to bypass the segfaults;
>>>>>>>>
>>>>>>>> I can confirm that splitting the single Allreduce with my
>>>>>>>> own MPI_Op into two Allreduce with MAX and BAND fixes the
>>>>>>>> segfaults with OpenMPI (*).
>>>>>>>>
>>>>>>>>> - look at the hypre issue and whether they should be
>>>>>>>>> deferred to the hypre team.
>>>>>>>>
>>>>>>>> I don’t know if there is something wrong in hypre threading
>>>>>>>> or if it’s just a side effect of threading, but it seems
>>>>>>>> that the number of threads has a drastic effect on the
>>>>>>>> quality of the PC.
>>>>>>>> By default, it looks that there are two threads per process
>>>>>>>> with your Docker image.
>>>>>>>> If I force OMP_NUM_THREADS=1, then I get the same
>>>>>>>> convergence as in the output file.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>>
>>>>>>>> (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
>>>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
>>>>>>>>
>>>>>>>>> Thank you for the Docker files, they were really useful.
>>>>>>>>> If you want to avoid oversubscription failures, you can
>>>>>>>>> edit the file
>>>>>>>>> /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and append
>>>>>>>>> the line:
>>>>>>>>> localhost slots=12
>>>>>>>>> If you want to increase the timeout limit of PETSc test
>>>>>>>>> suite for each test, you can add the extra flag in your
>>>>>>>>> command line TIMEOUT=180 (default is 60, units are seconds).
>>>>>>>>>
>>>>>>>>> Thanks, I’ll ping you on GitLab when I’ve got something
>>>>>>>>> ready for you to try,
>>>>>>>>> Pierre
>>>>>>>>>
>>>>>>>>> <ompi.cxx>
>>>>>>>>>
>>>>>>>>>> On 12 Mar 2021, at 8:54 PM, Eric Chamberland
>>>>>>>>>> <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Pierre,
>>>>>>>>>>
>>>>>>>>>> I now have a docker container reproducing the problems here.
>>>>>>>>>>
>>>>>>>>>> Actually, if I look at
>>>>>>>>>> snes_tutorials-ex12_quad_singular_hpddm it fails like this:
>>>>>>>>>>
>>>>>>>>>> not ok snes_tutorials-ex12_quad_singular_hpddm # Error
>>>>>>>>>> code: 59
>>>>>>>>>> # Initial guess
>>>>>>>>>> # L_2 Error: 0.00803099
>>>>>>>>>> # Initial Residual
>>>>>>>>>> # L_2 Residual: 1.09057
>>>>>>>>>> # Au - b = Au + F(0)
>>>>>>>>>> # Linear L_2 Residual: 1.09057
>>>>>>>>>> # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
>>>>>>>>>> # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
>>>>>>>>>> # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
>>>>>>>>>> # [3]PETSC ERROR:
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>> # [3]PETSC ERROR: Caught signal number 11 SEGV:
>>>>>>>>>> Segmentation Violation, probably memory access out of range
>>>>>>>>>> # [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>>> -on_error_attach_debugger
>>>>>>>>>> # [3]PETSC ERROR: or see
>>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>>>>>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>>>>>>>>> # [3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>>> <http://valgrind.org/> on GNU/linux and Apple Mac OS X to
>>>>>>>>>> find memory corruption errors
>>>>>>>>>> # [3]PETSC ERROR: likely location of problem given in
>>>>>>>>>> stack below
>>>>>>>>>> # [3]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>>> ------------------------------------
>>>>>>>>>> # [3]PETSC ERROR: Note: The EXACT line numbers in the
>>>>>>>>>> stack are not available,
>>>>>>>>>> # [3]PETSC ERROR: INSTEAD the line number of the start of
>>>>>>>>>> the function
>>>>>>>>>> # [3]PETSC ERROR: is given.
>>>>>>>>>> # [3]PETSC ERROR: [3] buildTwo line 987
>>>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>> # [3]PETSC ERROR: [3] next line 1130
>>>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>> # [3]PETSC ERROR: --------------------- Error Message
>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>> # [3]PETSC ERROR: Signal received
>>>>>>>>>> # [3]PETSC ERROR: [0]PETSC ERROR:
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> also ex12_quad_hpddm_reuse_baij fails with a lot more
>>>>>>>>>> "Read -1, expected ..." which I don't know where they
>>>>>>>>>> come from...?
>>>>>>>>>>
>>>>>>>>>> Hypre (like in diff-snes_tutorials-ex56_hypre) is also
>>>>>>>>>> having DIVERGED_INDEFINITE_PC failures...
>>>>>>>>>>
>>>>>>>>>> Please see the 3 attached docker files:
>>>>>>>>>>
>>>>>>>>>> 1) fedora_mkl_and_devtools : the DockerFile which install
>>>>>>>>>> fedore 33 with gnu compilers and MKL and everything to
>>>>>>>>>> develop.
>>>>>>>>>>
>>>>>>>>>> 2) openmpi: the DockerFile to bluid OpenMPI
>>>>>>>>>>
>>>>>>>>>> 3) petsc: The las DockerFile that build/install and test
>>>>>>>>>> PETSc
>>>>>>>>>>
>>>>>>>>>> I build the 3 like this:
>>>>>>>>>>
>>>>>>>>>> docker build -t fedora_mkl_and_devtools -f
>>>>>>>>>> fedora_mkl_and_devtools .
>>>>>>>>>>
>>>>>>>>>> docker build -t openmpi -f openmpi .
>>>>>>>>>>
>>>>>>>>>> docker build -t petsc -f petsc .
>>>>>>>>>>
>>>>>>>>>> Disclaimer: I am not a docker expert, so I may do things
>>>>>>>>>> that are not docker-stat-of-the-art but I am opened to
>>>>>>>>>> suggestions... ;)
>>>>>>>>>>
>>>>>>>>>> I have just ran it on my portable (long) which have not
>>>>>>>>>> enough cores, so many more tests failed (should force
>>>>>>>>>> --oversubscribe but don't know how to). I will relaunch
>>>>>>>>>> on my workstation in a few minutes.
>>>>>>>>>>
>>>>>>>>>> I will now test your branch! (sorry for the delay).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Eric
>>>>>>>>>>
>>>>>>>>>> On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>>
>>>>>>>>>>> ok, that's interesting!
>>>>>>>>>>>
>>>>>>>>>>> I will try to build a docker image until tomorrow and
>>>>>>>>>>> give you the exact recipe to reproduce the bugs.
>>>>>>>>>>>
>>>>>>>>>>> Eric
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On 11 Mar 2021, at 6:16 AM, Barry Smith
>>>>>>>>>>>>> <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Eric,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry about not being more immediate. We still have
>>>>>>>>>>>>> this in our active email so you don't need to submit
>>>>>>>>>>>>> individual issues. We'll try to get to them as soon as
>>>>>>>>>>>>> we can.
>>>>>>>>>>>>
>>>>>>>>>>>> Indeed, I’m still trying to figure this out.
>>>>>>>>>>>> I realized that some of my configure flags were
>>>>>>>>>>>> different than yours, e.g., no --with-memalign.
>>>>>>>>>>>> I’ve also added SuperLU_DIST to my installation.
>>>>>>>>>>>> Still, I can’t reproduce any issue.
>>>>>>>>>>>> I will continue looking into this, it appears I’m
>>>>>>>>>>>> seeing some valgrind errors, but I don’t know if this
>>>>>>>>>>>> is some side effect of OpenMPI not being valgrind-clean
>>>>>>>>>>>> (last time I checked, there was no error with MPICH).
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your patience,
>>>>>>>>>>>> Pierre
>>>>>>>>>>>>
>>>>>>>>>>>> /usr/bin/gmake -f gmakefile test test-fail=1
>>>>>>>>>>>> Using MAKEFLAGS: test-fail=1
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>>>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>>>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>>>>>>>>>> ok snes_tutorials-ex56_hypre
>>>>>>>>>>>> ok diff-snes_tutorials-ex56_hypre
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
>>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
>>>>>>>>>>>> #srun: error: Unable to create step for job 1426755:
>>>>>>>>>>>> More processors requested than permitted
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
>>>>>>>>>>>> failed so no diff
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
>>>>>>>>>>>> required for this test
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>>>>>>>>>> ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>> ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>>>>>>>>>> ok snes_tutorials-ex19_tut_3
>>>>>>>>>>>> ok diff-snes_tutorials-ex19_tut_3
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP
>>>>>>>>>>>> Fortran required for this test
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>>>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>>>>>>>>>> ok
>>>>>>>>>>>> snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>>>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
>>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
>>>>>>>>>>>> #srun: error: Unable to create step for job 1426755:
>>>>>>>>>>>> More processors requested than permitted
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
>>>>>>>>>>>> failed so no diff
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>>>>>>>>>> ok
>>>>>>>>>>>> snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>> ok
>>>>>>>>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
>>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
>>>>>>>>>>>> #srun: error: Unable to create step for job 1426755:
>>>>>>>>>>>> More processors requested than permitted
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
>>>>>>>>>>>> failed so no diff
>>>>>>>>>>>> TEST
>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>>>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP
>>>>>>>>>>>> Fortran required for this test
>>>>>>>>>>>>
>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland
>>>>>>>>>>>>>> <Eric.Chamberland at giref.ulaval.ca
>>>>>>>>>>>>>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Barry,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> to get a some follow up on --with-openmp=1 failures,
>>>>>>>>>>>>>> shall I open gitlab issues for:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> b) all superlu_dist failures giving different results
>>>>>>>>>>>>>> with initia and "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> c) hpddm failures "free(): invalid next size (fast)"
>>>>>>>>>>>>>> and "Segmentation Violation"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> d) all tao's "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't see how I could do all these debugging by
>>>>>>>>>>>>>> myself...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>>>>>> Professionnel de recherche
>>>>>>>>>>> GIREF/Université Laval
>>>>>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>>>> --
>>>>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>>>>> Professionnel de recherche
>>>>>>>>>> GIREF/Université Laval
>>>>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>>>> <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Eric Chamberland, ing., M. Ing
>>>>>> Professionnel de recherche
>>>>>> GIREF/Université Laval
>>>>>> (418) 656-2131 poste 41 22 42
>>>>>
>>>> --
>>>> Eric Chamberland, ing., M. Ing
>>>> Professionnel de recherche
>>>> GIREF/Université Laval
>>>> (418) 656-2131 poste 41 22 42
>>> --
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>> --
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Université Laval
>> (418) 656-2131 poste 41 22 42
>
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210330/853617cf/attachment-0001.html>
More information about the petsc-dev
mailing list