[petsc-dev] testharness rerun test based on error condition; GPU; gitlab issues still broken

Fri Sep 4 12:12:01 CDT 2020

The test harness prints:

# To rerun failed tests: 
#     /usr/bin/gmake -f gmakefile test test-fail=1

So perhaps we the CI can be changed to ignore result of 'make alltests' - and always run this [and then check the error code]

However - I'm not seeing error return here..

Satish
------

[balay at pj01 petsc.x]$ make test globsearch='*ksp*tests*ex49_*cg*'
Using MAKEFLAGS: -- globsearch=*ksp*tests*ex49_*cg*
        TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
 ok ksp_ksp_tests-ex49_cg
not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
#	2d1
#	< extra text
        TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_pipecg2.counts
 ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
 ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
 ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
 ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
 ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
 ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural

# -------------
#   Summary    
# -------------
# FAILED diff-ksp_ksp_tests-ex49_cg
# success 7/8 tests (87.5%)
# failed 1/8 tests (12.5%)
# todo 0/8 tests (0.0%)
# skip 0/8 tests (0.0%)
#
# Wall clock time for tests: 1 sec
# Approximate CPU time (not incl. build time): 0.19 sec
#
# To rerun failed tests: 
#     /usr/bin/gmake -f gmakefile test test-fail=1
#
# Timing summary (actual test time / total CPU time): 
#   ksp_ksp_tests-ex49_pipecg2: 0.02 sec / 0.19 sec
#   ksp_ksp_tests-ex49_cg: 0.00 sec / 0.00 sec
[balay at pj01 petsc.x]$ echo $?
0
[balay at pj01 petsc.x]$ /usr/bin/gmake -f gmakefile test test-fail=1
Using MAKEFLAGS: -- test-fail=1
        TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
 ok ksp_ksp_tests-ex49_cg
not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
#	2d1
#	< extra text

# -------------
#   Summary    
# -------------
# FAILED diff-ksp_ksp_tests-ex49_cg
# success 1/2 tests (50.0%)
# failed 1/2 tests (50.0%)
# todo 0/2 tests (0.0%)
# skip 0/2 tests (0.0%)
#
# Wall clock time for tests: 0 sec
# Approximate CPU time (not incl. build time): 0.01 sec
#
# To rerun failed tests: 
#     /usr/bin/gmake -f gmakefile test test-fail=1
#
# Timing summary (actual test time / total CPU time): 
#   ksp_ksp_tests-ex49_cg: 0.01 sec / 0.01 sec
[balay at pj01 petsc.x]$ echo $?
0
[balay at pj01 petsc.x]$ 

On Fri, 4 Sep 2020, Scott Kruger wrote:

> 
> 
> That's a good idea, but I'll have to think about this a bit.   It seems
> relatively straightforward, but I'd be doing this in bash so I'd like to come
> up with an implementation that is not overly complicated.    Do you have a job
> that has the issue offhand?
> 
> Scott
> 
> 
> On 9/4/20 10:27 AM, Barry Smith wrote:
> >    Scott,
> >
> >     How difficult would it be for the test harness to run a failed test
> >     again if the return code has specific values? Instead of erroring out.
> >
> >     I am thinking in particular about GPUs but it is general. If the GPU
> >     doesn't have he resources available it will error out thus crashing the
> >     entire job in the pipeline requiring retrying the job from the GUI.
> >     Wasting everyone's time.
> >
> >     Seems in theory like it should be pretty straightforward but, of course,
> >     unforeseen issues can make it difficult. Just check the program's error
> >     code and it if is certain values run the program again, or wait a few
> >     seconds and run
> >
> >    Barry
> >
> >
> > Issues are still broken hence here.
> 
>