[petsc-dev] broken nightlybuilds (next vs next-tmp)

Sat Nov 11 13:33:21 CST 2017

Satish Balay <balay at mcs.anl.gov> writes:

> On Sat, 11 Nov 2017, Jed Brown wrote:
>
>> > If you are running 'make alltests' on your laptop - then you don't need to test on es - before merge to maint.
>> 
>> alltests takes hours and doesn't catch weird configurations -- you need
>> different PETSC_ARCH for that.  It is normal to at least compile and run
>> a couple local tests.
>
> its better than taking days of broken next and having the next model
> stay broken. BTW - it takes less than an hour on my laptop.
> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/11/next.html
> And plenty on nightlbuild machines are running in less than an hour
> [some under 30min]
>
> There seems to be a binary switch here for you and Matt.  Its either
> local test find all issues or next test find all issues.
>
> My claim here is - local test should find some issues [major ones that
> might break all builds] - and let next find the remaing
> [machine/compiler/configuration] specific issues.

The proposal I'm objecting to, and that has prevented me from writing an
important letter of recommendation this morning during some precious
time while Joule is sleeping, was to eliminate 'next'.  We wouldn't have
needed to exchange dozens of emails if you just wanted to make a better
system for catching the easy stuff before it gets to 'next'.

I think a half hour is way too long.  The context switch is a problem.
If I work on something for PETSc during a particular hour of my day, I
want to be able to finish it.  If I need to start 30 minutes of testing,
I will likely be occupied with other things by the time it is done and
won't be able to check up on it for several hours when I have 100 emails
and a baby to feed, etc.  So realistically, I don't actually get to it
until the next day.  But this inability to finish what might be a
five-minute task makes development less fun and puts big incentives on
not bothering with minor contributions/fixes.

It also encourages writing a five minute email describing a solution
now, which "unexpectedly" evolves into a dozen emails and an hour of
time, instead of writing the 10-minute fix now because I know it would
need some slivers of time in the future, to push (which will likely fail
because someone else pushed, so I blast away the merge, pull, merge
again, compile again for sanity check but don't run alltests because I
hope not too much has changed, then push) and send the email reply
saying that the fix has been merged.

Also, if my computer is busy running tests, I can't move on to the next
thing that needs to be done without having multiple PETSc clones.  Since
I can't use existing PETSC_ARCH in a different PETSC_DIR, maintaining
lots of clones means lots of reconfiguring and rebuilding without the
benefits of ccache.  All this sucks time.

I'm sure I'm not the only one in a similar situation.  We need an
effective set of tests that runs in less than five minutes so that we
can fix problems and move on rather than having lots of open threads
hanging around.