<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><div></div><div><br class=""><blockquote type="cite" class=""><div class="">On 12 Mar 2021, at 8:54 PM, Eric Chamberland <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" class="">Eric.Chamberland@giref.ulaval.ca</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
<div class=""><p class="">Hi Pierre,</p><p class="">I now have a docker container reproducing the problems here.</p><p class="">Actually, if I look at snes_tutorials-ex12_quad_singular_hpddm
it fails like this:</p><p class="">not ok snes_tutorials-ex12_quad_singular_hpddm # Error code: 59<br class="">
# Initial guess<br class="">
# L_2 Error: 0.00803099<br class="">
# Initial Residual<br class="">
# L_2 Residual: 1.09057<br class="">
# Au - b = Au + F(0)<br class="">
# Linear L_2 Residual: 1.09057<br class="">
# [d470c54ce086:14127] Read -1, expected 4096, errno = 1<br class="">
# [d470c54ce086:14128] Read -1, expected 4096, errno = 1<br class="">
# [d470c54ce086:14129] Read -1, expected 4096, errno = 1<br class="">
# [3]PETSC ERROR:
------------------------------------------------------------------------<br class="">
# [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
Violation, probably memory access out of range<br class="">
# [3]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger<br class="">
# [3]PETSC ERROR: or see
<a class="moz-txt-link-freetext" href="https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind">https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br class="">
# [3]PETSC ERROR: or try <a class="moz-txt-link-freetext" href="http://valgrind.org/">http://valgrind.org</a> on GNU/linux
and Apple Mac OS X to find memory corruption errors<br class="">
# [3]PETSC ERROR: likely location of problem given in stack
below<br class="">
# [3]PETSC ERROR: --------------------- Stack Frames
------------------------------------<br class="">
# [3]PETSC ERROR: Note: The EXACT line numbers in the stack
are not available,<br class="">
# [3]PETSC ERROR: INSTEAD the line number of the start
of the function<br class="">
# [3]PETSC ERROR: is given.<br class="">
# [3]PETSC ERROR: [3] buildTwo line 987
/opt/petsc-main/include/HPDDM_schwarz.hpp<br class="">
# [3]PETSC ERROR: [3] next line 1130
/opt/petsc-main/include/HPDDM_schwarz.hpp<br class="">
# [3]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------<br class="">
# [3]PETSC ERROR: Signal received<br class="">
# [3]PETSC ERROR: [0]PETSC ERROR:
------------------------------------------------------------------------</p><p class="">also ex12_quad_hpddm_reuse_baij fails with a lot more "Read -1,
expected ..." which I don't know where they come from...?</p><p class="">Hypre (like in diff-snes_tutorials-ex56_hypre) is also having
DIVERGED_INDEFINITE_PC failures...</p><p class="">Please see the 3 attached docker files:</p><p class="">1) fedora_mkl_and_devtools : the DockerFile which install fedore
33 with gnu compilers and MKL and everything to develop.</p><p class="">2) openmpi: the DockerFile to bluid OpenMPI</p><p class="">3) petsc: The las DockerFile that build/install and test PETSc</p><p class="">I build the 3 like this:</p><p class="">docker build -t fedora_mkl_and_devtools -f
fedora_mkl_and_devtools .</p><p class="">docker build -t openmpi -f openmpi .</p><p class="">docker build -t petsc -f petsc .</p><p class="">Disclaimer: I am not a docker expert, so I may do things that are
not docker-stat-of-the-art but I am opened to suggestions... ;)<br class="">
</p><p class="">I have just ran it on my portable (long) which have not enough
cores, so many more tests failed (should force --oversubscribe but
don't know how to). I will relaunch on my workstation in a few
minutes.</p><p class="">I will now test your branch! (sorry for the delay).</p><p class="">Thanks,</p><p class="">Eric<br class="">
</p>
<div class="moz-cite-prefix">On 2021-03-11 9:03 a.m., Eric
Chamberland wrote:<br class="">
</div>
<blockquote type="cite" cite="mid:84132559-9c3e-3ba8-5473-06a0a9b7be2f@giref.ulaval.ca" class="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class=""><p class="">Hi Pierre,</p><p class="">ok, that's interesting!</p><p class="">I will try to build a docker image until tomorrow and give you
the exact recipe to reproduce the bugs.</p><p class="">Eric</p><p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 2021-03-11 2:46 a.m., Pierre
Jolivet wrote:<br class="">
</div>
<blockquote type="cite" cite="mid:51B12D01-EDA4-4D95-ABD8-E0D5ECC669E2@joliv.et" class="">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" class="">
<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On 11 Mar 2021, at 6:16 AM, Barry Smith <<a href="mailto:bsmith@petsc.dev" class="" moz-do-not-send="true">bsmith@petsc.dev</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode:
space; line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
Eric,
<div class=""><br class="">
</div>
<div class=""> Sorry about not being more immediate.
We still have this in our active email so you don't
need to submit individual issues. We'll try to get to
them as soon as we can.</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Indeed, I’m still trying to figure this out.</div>
<div class="">I realized that some of my configure flags were different
than yours, e.g., no --with-memalign.</div>
<div class="">I’ve also added SuperLU_DIST to my installation.</div>
<div class="">Still, I can’t reproduce any issue.</div>
<div class="">I will continue looking into this, it appears I’m seeing
some valgrind errors, but I don’t know if this is some side
effect of OpenMPI not being valgrind-clean (last time I
checked, there was no error with MPICH).</div>
<div class=""><br class="">
</div>
<div class="">Thank you for your patience,</div>
<div class="">Pierre</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">/usr/bin/gmake -f gmakefile test test-fail=1</div>
<div class="">Using MAKEFLAGS: test-fail=1</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts</div>
<div class=""> ok snes_tutorials-ex12_quad_hpddm_reuse_baij</div>
<div class=""> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts</div>
<div class=""> ok ksp_ksp_tests-ex33_superlu_dist_2</div>
<div class=""> ok diff-ksp_ksp_tests-ex33_superlu_dist_2</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0</div>
<div class=""> ok
ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1</div>
<div class=""> ok
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex50_tut_2</div>
<div class=""> ok diff-ksp_ksp_tutorials-ex50_tut_2</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts</div>
<div class=""> ok ksp_ksp_tests-ex33_superlu_dist</div>
<div class=""> ok diff-ksp_ksp_tests-ex33_superlu_dist</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts</div>
<div class=""> ok snes_tutorials-ex56_hypre</div>
<div class=""> ok diff-snes_tutorials-ex56_hypre</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex56_2</div>
<div class=""> ok diff-ksp_ksp_tutorials-ex56_2</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts</div>
<div class=""> ok snes_tutorials-ex17_3d_q3_trig_elas</div>
<div class=""> ok diff-snes_tutorials-ex17_3d_q3_trig_elas</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts</div>
<div class=""> ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij</div>
<div class=""> ok
diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts</div>
<div class="">not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error
code: 1</div>
<div class="">#<span class="Apple-tab-span" style="white-space:pre"> </span>srun:
error: Unable to create step for job 1426755: More
processors requested than permitted</div>
<div class=""> ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command
failed so no diff</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran
required for this test</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts</div>
<div class=""> ok snes_tutorials-ex12_tri_parmetis_hpddm_baij</div>
<div class=""> ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts</div>
<div class=""> ok snes_tutorials-ex19_tut_3</div>
<div class=""> ok diff-snes_tutorials-ex19_tut_3</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts</div>
<div class=""> ok snes_tutorials-ex17_3d_q3_trig_vlap</div>
<div class=""> ok diff-snes_tutorials-ex17_3d_q3_trig_vlap</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP
Fortran required for this test</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts</div>
<div class=""> ok snes_tutorials-ex19_superlu_dist</div>
<div class=""> ok diff-snes_tutorials-ex19_superlu_dist</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts</div>
<div class=""> ok
snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre</div>
<div class=""> ok
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex49_hypre_nullspace</div>
<div class=""> ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts</div>
<div class=""> ok snes_tutorials-ex19_superlu_dist_2</div>
<div class=""> ok diff-snes_tutorials-ex19_superlu_dist_2</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts</div>
<div class="">not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error
code: 1</div>
<div class="">#<span class="Apple-tab-span" style="white-space:pre"> </span>srun:
error: Unable to create step for job 1426755: More
processors requested than permitted</div>
<div class=""> ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command
failed so no diff</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts</div>
<div class=""> ok
snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre</div>
<div class=""> ok
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex64_1</div>
<div class=""> ok diff-ksp_ksp_tutorials-ex64_1</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts</div>
<div class="">not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code:
1</div>
<div class="">#<span class="Apple-tab-span" style="white-space:pre"> </span>srun:
error: Unable to create step for job 1426755: More
processors requested than permitted</div>
<div class=""> ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command
failed so no diff</div>
<div class=""> TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts</div>
<div class=""> ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP
Fortran required for this test</div>
</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode:
space; line-break: after-white-space;" class="">
<div class=""> Barry</div>
<div class=""><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Mar 10, 2021, at 11:03 PM, Eric
Chamberland <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" class="" moz-do-not-send="true">Eric.Chamberland@giref.ulaval.ca</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
<div class=""><p class="">Barry,</p><p class="">to get a some follow up on
--with-openmp=1 failures, shall I open
gitlab issues for:</p><p class="">a) all hypre failures giving <span style="white-space: pre-wrap;" class="">DIVERGED_INDEFINITE_PC</span></p><p class=""><span style="white-space: pre-wrap;" class="">b) all superlu_dist failures giving different results with </span><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">initia and "Exceeded timeout limit of 60 s"</span></span></p><p class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">c) hpddm failures "free(): invalid next size (fast)" and "Segmentation Violation"
</span></span></p><p class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">d) all tao's </span></span><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">"Exceeded timeout limit of 60 s"</span></span></span></span></p><p class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">I don't see how I could do all these debugging by myself...</span></span></span></span></p><p class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">Thanks,</span></span></span></span></p><p class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class=""><span style="white-space: pre-wrap;" class="">Eric
</span></span></span></span></p>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</blockquote>
<pre class="moz-signature" cols="72">--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42</pre>
</div>
<span id="cid:26EB476E-4C18-4B68-9DC9-6FBE92E94935"><fedora_mkl_and_devtools.txt></span><span id="cid:EC379F4B-01BD-409E-8BBC-6FBA5A49236E"><openmpi.txt></span><span id="cid:3CA91B7D-219A-4965-9FF3-D836488847A0"><petsc.txt></span></div></blockquote></div><br class=""></div></body></html>