[petsc-dev] broken nightlybuilds (next vs next-tmp)

Sat Nov 11 12:47:52 CST 2017

Satish Balay <balay at mcs.anl.gov> writes:

> On Sat, 11 Nov 2017, Jed Brown wrote:
>
>> Satish Balay <balay at mcs.anl.gov> writes:
>> 
>> > On Sat, 11 Nov 2017, Jed Brown wrote:
>> >
>> >> > I don't think we have the resources to run full tests on every branch one
>> >> > at a time. Do we?
>> >> 
>> >> No,
>> >
>> > Well the hope is - after the migration to new test suite is complete
>> > the cost of a full test run is lower. And we could somehow do fewer
>> > tests to capture most issues.
>> >
>> >> and after each merge of a branch to 'master', the prospective merge
>> >> of other branches would need to be retested.  But the idea that the
>> >> automated test suite is infallible is also flawed.
>> >
>> > Well arn't we relying on 'automated testing' with the current next
>> > model?
>> 
>> Some of us also run 'next' in daily work and fix issues as they appear
>> in that context.  There is also significant convenience in there being
>> one place we can go to reproduce all issues.
>
> If you need next for some other purpose than graduation branches to
> master - than you can crate a workflow [with next or a different
> branch] for that purpose.
>
> The way I see it - a broken next [where folks can't easily figure out
> who or which commit is responsible for the brakages] - doesn't help
> much..

The fundamental problem here is that we aren't accurate enough at
placing blame and getting the appropriate person to fix it.  It doesn't
help that we are a distributed team and have plenty of our own
obligations.  I can't fix something while I'm teaching class or meeting
with students, for example.  But we should all be able to get to it
within a day, either to withdraw the branch from 'next' or to actually
fix it.

I think a lot of our noise in 'next' is "stupid shit", like compilation
failing on some architecture.  Automating a very limited test suite
running on PRs within minutes should help a lot to deal with that.  More
subtle interaction problems can and should continue to be dealt with via
'next'.

> The other option I've been comtemplating [wrt next-tmp] is - test all
> feature branches for barry one day - and all from matt - the next day
> etc. Then I don't have to figure out which commit broke things. It
> would be one of the branches of that author - so they can deal with
> figureing out what broke.
>
> [and then throw this away - and test some of the branches the next day]
>
>> When testing separate branches, it isn't enough to merely test their
>> head, we would have to test the result of a candidate merge. 
>
> Yes - thats what I've been saying - test 'master + feature-1'
>
>> If there
>> are any conflicts, that merge needs to be done manually, but where would
>> we put it for automated testing?
>
> If there are conflicts - there wont' be a merge or a test until the
> author resolves the conflict. [via a merge or rebase]. This can go on
> until the feature branch gets merged to master [or maint]
>
>>  Make a new branch 'candidate-merge-jed/foo-to-master' and push
>> that, then look for results several hours or a day later?  With
>> 'next', we make merges to one place and don't need a different
>> workflow for no-conflict versus conflicted merges.
>
> Yes there has to be some mechanism to say the branch
> 'candidate-merge-jed/foo-to-master' is ready for master. Most folks
> appear to use PRs for this workflow. But we don't. And Barry doesn't
> like it. So instead of PR - branches can be placed in some
> placeholder.
>
> For ex: I'm currently getting branch list from 'next' - selecting a
> few of them for next-tmp tests.
>
> Wrt merge conflicts - an author is primarily responsible to resolve
> them wrt master. If master changes with new feature - which causes
> conflict - another merge conflict resolution is required - it should
> be the same amount of work as doing this with next.
>
> Note: with next - any merge resolution that gets done has to be
> repeated when the branch gets merged to master. [Since git doesn't
> keep trak of this by default] the second merge to master is not always
> the same as the first one in next. This causes merge conflict when
> master gets merged to next [yeah I've seen this a few times]

If the same person does the merge, the same resolution (assuming it
still works) comes automatically.  The point is that having a bunch of
candidate merge branches that need to be ported around to different
machines will be a far bigger mess than we have now, and still doesn't
fix the problems.