<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 9/4/20 11:12 AM, Satish Balay wrote:<br>
</div>
<blockquote type="cite"
cite="mid:alpine.LFD.2.23.451.2009041207340.2378@sb">
<pre wrap="">The test harness prints:
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
So perhaps we the CI can be changed to ignore result of 'make alltests' - and always run this [and then check the error code]</pre>
</blockquote>
<br>
But this says even if we have legit failures then we should rerun
this, and then worry about whether it is a real error code.<br>
<blockquote type="cite"
cite="mid:alpine.LFD.2.23.451.2009041207340.2378@sb">
<pre wrap="">
However - I'm not seeing error return here..
Satish
------
[balay@pj01 petsc.x]$ make test globsearch='*ksp*tests*ex49_*cg*'
Using MAKEFLAGS: -- globsearch=*ksp*tests*ex49_*cg*
TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
ok ksp_ksp_tests-ex49_cg
not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
# 2d1
# < extra text
TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_pipecg2.counts</pre>
</blockquote>
<br>
This isn't a good example since it's a diff error. It's not what
Barry is referring to.<br>
<br>
Scott<br>
<br>
<blockquote type="cite"
cite="mid:alpine.LFD.2.23.451.2009041207340.2378@sb">
<pre wrap="">
ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
# -------------
# Summary
# -------------
# FAILED diff-ksp_ksp_tests-ex49_cg
# success 7/8 tests (87.5%)
# failed 1/8 tests (12.5%)
# todo 0/8 tests (0.0%)
# skip 0/8 tests (0.0%)
#
# Wall clock time for tests: 1 sec
# Approximate CPU time (not incl. build time): 0.19 sec
#
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
#
# Timing summary (actual test time / total CPU time):
# ksp_ksp_tests-ex49_pipecg2: 0.02 sec / 0.19 sec
# ksp_ksp_tests-ex49_cg: 0.00 sec / 0.00 sec
[balay@pj01 petsc.x]$ echo $?
0
[balay@pj01 petsc.x]$ /usr/bin/gmake -f gmakefile test test-fail=1
Using MAKEFLAGS: -- test-fail=1
TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
ok ksp_ksp_tests-ex49_cg
not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
# 2d1
# < extra text
# -------------
# Summary
# -------------
# FAILED diff-ksp_ksp_tests-ex49_cg
# success 1/2 tests (50.0%)
# failed 1/2 tests (50.0%)
# todo 0/2 tests (0.0%)
# skip 0/2 tests (0.0%)
#
# Wall clock time for tests: 0 sec
# Approximate CPU time (not incl. build time): 0.01 sec
#
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
#
# Timing summary (actual test time / total CPU time):
# ksp_ksp_tests-ex49_cg: 0.01 sec / 0.01 sec
[balay@pj01 petsc.x]$ echo $?
0
[balay@pj01 petsc.x]$
On Fri, 4 Sep 2020, Scott Kruger wrote:
</pre>
<blockquote type="cite">
<pre wrap="">
That's a good idea, but I'll have to think about this a bit. It seems
relatively straightforward, but I'd be doing this in bash so I'd like to come
up with an implementation that is not overly complicated. Do you have a job
that has the issue offhand?
Scott
On 9/4/20 10:27 AM, Barry Smith wrote:
</pre>
<blockquote type="cite">
<pre wrap=""> Scott,
How difficult would it be for the test harness to run a failed test
again if the return code has specific values? Instead of erroring out.
I am thinking in particular about GPUs but it is general. If the GPU
doesn't have he resources available it will error out thus crashing the
entire job in the pipeline requiring retrying the job from the GUI.
Wasting everyone's time.
Seems in theory like it should be pretty straightforward but, of course,
unforeseen issues can make it difficult. Just check the program's error
code and it if is certain values run the program again, or wait a few
seconds and run
Barry
Issues are still broken hence here.
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Tech-X Corporation <a class="moz-txt-link-abbreviated" href="mailto:kruger@txcorp.com">kruger@txcorp.com</a>
5621 Arapahoe Ave, Suite A Phone: (720) 974-1841
Boulder, CO 80303 Fax: (303) 448-7756</pre>
</body>
</html>