[petsc-dev] testharness rerun test based on error condition; GPU; gitlab issues still broken
Satish Balay
balay at mcs.anl.gov
Fri Sep 4 12:12:01 CDT 2020
The test harness prints:
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
So perhaps we the CI can be changed to ignore result of 'make alltests' - and always run this [and then check the error code]
However - I'm not seeing error return here..
Satish
------
[balay at pj01 petsc.x]$ make test globsearch='*ksp*tests*ex49_*cg*'
Using MAKEFLAGS: -- globsearch=*ksp*tests*ex49_*cg*
TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
ok ksp_ksp_tests-ex49_cg
not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
# 2d1
# < extra text
TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_pipecg2.counts
ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
# -------------
# Summary
# -------------
# FAILED diff-ksp_ksp_tests-ex49_cg
# success 7/8 tests (87.5%)
# failed 1/8 tests (12.5%)
# todo 0/8 tests (0.0%)
# skip 0/8 tests (0.0%)
#
# Wall clock time for tests: 1 sec
# Approximate CPU time (not incl. build time): 0.19 sec
#
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
#
# Timing summary (actual test time / total CPU time):
# ksp_ksp_tests-ex49_pipecg2: 0.02 sec / 0.19 sec
# ksp_ksp_tests-ex49_cg: 0.00 sec / 0.00 sec
[balay at pj01 petsc.x]$ echo $?
0
[balay at pj01 petsc.x]$ /usr/bin/gmake -f gmakefile test test-fail=1
Using MAKEFLAGS: -- test-fail=1
TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
ok ksp_ksp_tests-ex49_cg
not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
# 2d1
# < extra text
# -------------
# Summary
# -------------
# FAILED diff-ksp_ksp_tests-ex49_cg
# success 1/2 tests (50.0%)
# failed 1/2 tests (50.0%)
# todo 0/2 tests (0.0%)
# skip 0/2 tests (0.0%)
#
# Wall clock time for tests: 0 sec
# Approximate CPU time (not incl. build time): 0.01 sec
#
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
#
# Timing summary (actual test time / total CPU time):
# ksp_ksp_tests-ex49_cg: 0.01 sec / 0.01 sec
[balay at pj01 petsc.x]$ echo $?
0
[balay at pj01 petsc.x]$
On Fri, 4 Sep 2020, Scott Kruger wrote:
>
>
> That's a good idea, but I'll have to think about this a bit. It seems
> relatively straightforward, but I'd be doing this in bash so I'd like to come
> up with an implementation that is not overly complicated. Do you have a job
> that has the issue offhand?
>
> Scott
>
>
> On 9/4/20 10:27 AM, Barry Smith wrote:
> > Scott,
> >
> > How difficult would it be for the test harness to run a failed test
> > again if the return code has specific values? Instead of erroring out.
> >
> > I am thinking in particular about GPUs but it is general. If the GPU
> > doesn't have he resources available it will error out thus crashing the
> > entire job in the pipeline requiring retrying the job from the GUI.
> > Wasting everyone's time.
> >
> > Seems in theory like it should be pretty straightforward but, of course,
> > unforeseen issues can make it difficult. Just check the program's error
> > code and it if is certain values run the program again, or wait a few
> > seconds and run
> >
> > Barry
> >
> >
> > Issues are still broken hence here.
>
>
More information about the petsc-dev
mailing list