[petsc-dev] testharness rerun test based on error condition; GPU; gitlab issues still broken

Fri Sep 4 16:23:26 CDT 2020

On 9/4/20 11:12 AM, Satish Balay wrote:
> The test harness prints:
>
> # To rerun failed tests:
> #     /usr/bin/gmake -f gmakefile test test-fail=1
>
> So perhaps we the CI can be changed to ignore result of 'make alltests' - and always run this [and then check the error code]

But this says even if we have legit failures then we should rerun this, 
and then worry about whether it is a real error code.
>
> However - I'm not seeing error return here..
>
> Satish
> ------
>
> [balay at pj01 petsc.x]$ make test globsearch='*ksp*tests*ex49_*cg*'
> Using MAKEFLAGS: -- globsearch=*ksp*tests*ex49_*cg*
>          TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
>   ok ksp_ksp_tests-ex49_cg
> not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
> #	2d1
> #	< extra text
>          TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_pipecg2.counts

This isn't a good example since it's a diff error.  It's not what Barry 
is referring to.

Scott

>   ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
>   ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-preconditioned
>   ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
>   ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-unpreconditioned
>   ok ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
>   ok diff-ksp_ksp_tests-ex49_pipecg2+ksp_norm_type-natural
>
> # -------------
> #   Summary
> # -------------
> # FAILED diff-ksp_ksp_tests-ex49_cg
> # success 7/8 tests (87.5%)
> # failed 1/8 tests (12.5%)
> # todo 0/8 tests (0.0%)
> # skip 0/8 tests (0.0%)
> #
> # Wall clock time for tests: 1 sec
> # Approximate CPU time (not incl. build time): 0.19 sec
> #
> # To rerun failed tests:
> #     /usr/bin/gmake -f gmakefile test test-fail=1
> #
> # Timing summary (actual test time / total CPU time):
> #   ksp_ksp_tests-ex49_pipecg2: 0.02 sec / 0.19 sec
> #   ksp_ksp_tests-ex49_cg: 0.00 sec / 0.00 sec
> [balay at pj01 petsc.x]$ echo $?
> 0
> [balay at pj01 petsc.x]$ /usr/bin/gmake -f gmakefile test test-fail=1
> Using MAKEFLAGS: -- test-fail=1
>          TEST arch-complex/tests/counts/ksp_ksp_tests-ex49_cg.counts
>   ok ksp_ksp_tests-ex49_cg
> not ok diff-ksp_ksp_tests-ex49_cg # Error code: 1
> #	2d1
> #	< extra text
>
> # -------------
> #   Summary
> # -------------
> # FAILED diff-ksp_ksp_tests-ex49_cg
> # success 1/2 tests (50.0%)
> # failed 1/2 tests (50.0%)
> # todo 0/2 tests (0.0%)
> # skip 0/2 tests (0.0%)
> #
> # Wall clock time for tests: 0 sec
> # Approximate CPU time (not incl. build time): 0.01 sec
> #
> # To rerun failed tests:
> #     /usr/bin/gmake -f gmakefile test test-fail=1
> #
> # Timing summary (actual test time / total CPU time):
> #   ksp_ksp_tests-ex49_cg: 0.01 sec / 0.01 sec
> [balay at pj01 petsc.x]$ echo $?
> 0
> [balay at pj01 petsc.x]$
>
>
>
> On Fri, 4 Sep 2020, Scott Kruger wrote:
>
>>
>> That's a good idea, but I'll have to think about this a bit.   It seems
>> relatively straightforward, but I'd be doing this in bash so I'd like to come
>> up with an implementation that is not overly complicated.    Do you have a job
>> that has the issue offhand?
>>
>> Scott
>>
>>
>> On 9/4/20 10:27 AM, Barry Smith wrote:
>>>     Scott,
>>>
>>>      How difficult would it be for the test harness to run a failed test
>>>      again if the return code has specific values? Instead of erroring out.
>>>
>>>      I am thinking in particular about GPUs but it is general. If the GPU
>>>      doesn't have he resources available it will error out thus crashing the
>>>      entire job in the pipeline requiring retrying the job from the GUI.
>>>      Wasting everyone's time.
>>>
>>>      Seems in theory like it should be pretty straightforward but, of course,
>>>      unforeseen issues can make it difficult. Just check the program's error
>>>      code and it if is certain values run the program again, or wait a few
>>>      seconds and run
>>>
>>>     Barry
>>>
>>>
>>> Issues are still broken hence here.
>>

-- 
Tech-X Corporation               kruger at txcorp.com
5621 Arapahoe Ave, Suite A       Phone: (720) 974-1841
Boulder, CO 80303                Fax:   (303) 448-7756

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200904/cb3fa1ad/attachment.html>