[petsc-dev] testharness rerun test based on error condition; GPU; gitlab issues still broken
Scott Kruger
kruger at txcorp.com
Fri Sep 4 16:23:26 CDT 2020
On 9/4/20 11:12 AM, Satish Balay wrote:
> The test harness prints:
>
> # To rerun failed tests:
> # /usr/bin/gmake -f gmakefile test test-fail=1
>
> So perhaps we the CI can be changed to ignore result of 'make alltests' - and always run this [and then check the error code]
But this says even if we have legit failures then we should rerun this,
and then worry about whether it is a real error code.
>
> However - I'm not seeing error return here..
>
> Satish
> ------
>
> [balay at pj01 petsc.x]$ make test globsearch='*ksp*tests*ex49_*cg*'
> Using MAKEFLAGS: -- globsearch=*ksp*tests*ex49_*cg*
> TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
> ok ksp_ksp_tests-ex49_cg
> not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
> # 2d1
> # < extra text
> TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_pipecg2.counts
This isn't a good example since it's a diff error. It's not what Barry
is referring to.
Scott
> ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
> ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
> ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
> ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
> ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
> ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
>
> # -------------
> # Summary
> # -------------
> # FAILED diff-ksp_ksp_tests-ex49_cg
> # success 7/8 tests (87.5%)
> # failed 1/8 tests (12.5%)
> # todo 0/8 tests (0.0%)
> # skip 0/8 tests (0.0%)
> #
> # Wall clock time for tests: 1 sec
> # Approximate CPU time (not incl. build time): 0.19 sec
> #
> # To rerun failed tests:
> # /usr/bin/gmake -f gmakefile test test-fail=1
> #
> # Timing summary (actual test time / total CPU time):
> # ksp_ksp_tests-ex49_pipecg2: 0.02 sec / 0.19 sec
> # ksp_ksp_tests-ex49_cg: 0.00 sec / 0.00 sec
> [balay at pj01 petsc.x]$ echo $?
> 0
> [balay at pj01 petsc.x]$ /usr/bin/gmake -f gmakefile test test-fail=1
> Using MAKEFLAGS: -- test-fail=1
> TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
> ok ksp_ksp_tests-ex49_cg
> not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
> # 2d1
> # < extra text
>
> # -------------
> # Summary
> # -------------
> # FAILED diff-ksp_ksp_tests-ex49_cg
> # success 1/2 tests (50.0%)
> # failed 1/2 tests (50.0%)
> # todo 0/2 tests (0.0%)
> # skip 0/2 tests (0.0%)
> #
> # Wall clock time for tests: 0 sec
> # Approximate CPU time (not incl. build time): 0.01 sec
> #
> # To rerun failed tests:
> # /usr/bin/gmake -f gmakefile test test-fail=1
> #
> # Timing summary (actual test time / total CPU time):
> # ksp_ksp_tests-ex49_cg: 0.01 sec / 0.01 sec
> [balay at pj01 petsc.x]$ echo $?
> 0
> [balay at pj01 petsc.x]$
>
>
>
> On Fri, 4 Sep 2020, Scott Kruger wrote:
>
>>
>> That's a good idea, but I'll have to think about this a bit. It seems
>> relatively straightforward, but I'd be doing this in bash so I'd like to come
>> up with an implementation that is not overly complicated. Do you have a job
>> that has the issue offhand?
>>
>> Scott
>>
>>
>> On 9/4/20 10:27 AM, Barry Smith wrote:
>>> Scott,
>>>
>>> How difficult would it be for the test harness to run a failed test
>>> again if the return code has specific values? Instead of erroring out.
>>>
>>> I am thinking in particular about GPUs but it is general. If the GPU
>>> doesn't have he resources available it will error out thus crashing the
>>> entire job in the pipeline requiring retrying the job from the GUI.
>>> Wasting everyone's time.
>>>
>>> Seems in theory like it should be pretty straightforward but, of course,
>>> unforeseen issues can make it difficult. Just check the program's error
>>> code and it if is certain values run the program again, or wait a few
>>> seconds and run
>>>
>>> Barry
>>>
>>>
>>> Issues are still broken hence here.
>>
--
Tech-X Corporation kruger at txcorp.com
5621 Arapahoe Ave, Suite A Phone: (720) 974-1841
Boulder, CO 80303 Fax: (303) 448-7756
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200904/cb3fa1ad/attachment.html>
More information about the petsc-dev
mailing list