[petsc-dev] [petsc-users] new book introducing PETSc for PDEs

Sun Nov 1 13:03:11 CST 2020

BTW: wrt fork (clone) vs subtree  (I'll have to read https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec again) - the way I understand it is:

One can do push/pull/merge with subtree that are equivalent to push/pull/merge with a clone.

i.e; with a local clone one does: [branch is implicit here so lets stick with default master branch]

git clone repo-URL
cd repo
<edit>
git commit
git pull
git push

with subtree one does:

cd petsc
git subtree add --prefix=location repo-url [branch]
<edit>
git commit
git subtree pull --prefix=location repo-url [branch]
git subtree push --prefix=location repo-url [branch]

i.e the complexity of branches/merge issues don't go away - it just avoids the extra repo [and avoids making 2 commits in sync in 2 different repos]

Notes:
- wrt petsc4py we don't have a 'subtree pull' in the workflow
- and 'subtree push' is currently broken - so petsc4py workflow is currently broken.

Also note: with the previous petsc4py workflow we had:
 - make MR to petsc4py
 - use this commit with --download-petsc4py in petsc [in MR branch]
 - do not merge petsc-MR until petsc4py-MR is merged [this way - any revisions in petsc4py-MR - that will change the hash can be updated in petsc-MR]

With the current discussion - the workflow for examples is more complicated - i.e the changes petsc-developers do and that get accepted upstream can be different [perhaps temporarily.] - so there is extra complexity here where local changes (fork or subrepo) are not in sync with upstream changes. One way to sync is:
- make local changes (and use them with petsc CI)
- make separate MR to upstream (revise and update the MR - as needed for acceptance). Note: I'm not sure if there is a workflow to test these revisions in petsc cI
- once merged to upstream example repo - sync [subtree, or fork] - either a merge - or revert local changes, pull upstream changes.

So a simpler model is [as Ed referred to]: use upstream only, and the CI use (allow_failure:true) - and then Ed can fix the issues as he sees fit [and there will be CI errors until the fix is in]. And the petsc-MR is not held back (even with fundamental failures)

Satish

On Sun, 1 Nov 2020, Satish Balay wrote:

> I should add - we should:
> 
> - ignore git subtree or a fork
> - assume one (or more) petsc developers have write access to create branches in upstream repo
> - figure out workflow for the requirements [and current constraints wrt book].
> 
> - a fork primarily helps with write access [wrt branches, when no write access is available for upstream repo] - there is a mirror sync cost
> - a subtree [assuming its not buggy] - helps with a single commit workflow [vs multiple commits in different repos - and keeping them in sync]
> - [a default is neither - using upstream repo directly]
> 
> And in all cases there is similar synchronization cost between upstream repo and fork/subtree [this depends on the requirements - and current constranits]
> 
> Satish
> 
> On Sun, 1 Nov 2020, Satish Balay via petsc-dev wrote:
> 
> > On Sun, 1 Nov 2020, Barry Smith wrote:
> > 
> > > 
> > > 
> > > > On Nov 1, 2020, at 10:30 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> > > > 
> > > > Just a note: subtree workflow is currently broken with petsc4py
> > > > 
> > > > We've imported petsc4py using subtree - but now we are unable to export back changes in petsc repo to the standalone petsc4py repo due to git subtree errors.
> > > > 
> > > > I spent some time on workarounds - but didn't go anywhere.
> > > > 
> > > > [likely this is a non-issue with this examples repo workflow - as there is no direct push back to upstream repo]
> > > > 
> > > > And with subtree - not all developers need to know about it. Only the person who is syncing the 2 repos.. [if they know how to sync/resolve conflicts]
> > > 
> > >   Since each person is reponsable for their branch getting merged, doesn't this mean anyone who changes something in PETSc that requires a change in p4pdes has to fix it via gittree process ( that is have have to sync the 2 repose?) and hence now how to sync 2 repos via gittree.
> > 
> > 
> > For one its not clear what problem is being addressed here [as mentioned in my other e-mail.]
> > 
> > >From Ed's e-mail - my understanding is - whatever changes petsc developers might have - might not be acceptable.
> > 
> > So I think first we need to figure out what is acceptable and not - and then figure out the workflow.
> > 
> > " Since each person is reponsable for their branch getting merged, "
> > 
> > So its not clear to me if this fits with what Ed mentioned. [wrt his goals of keeping the examples in sync with the book]
> > 
> > BTW I should note: the current process [which I don't understand completely] looks more complicated than what we had with petsc4py - which you were against at - and raised this at  every opportunity - until petsc4py was imported into petsc (using subtree)
> > 
> > Satish
> > 
> > > 
> > >   Until I see the entire gittree workflow I will not understand.
> > > 
> > >   Barry
> > > 
> > > > 
> > > > Satish
> > > > 
> > > > On Sun, 1 Nov 2020, Jed Brown wrote:
> > > > 
> > > >> Barry Smith <bsmith at petsc.dev> writes:
> > > >> 
> > > >>>>> git subtree does require use of new commands every time you mess with it (say every three months) that we do not know and since each of 
> > > >>>>> us will do this infrequently it is likely we will not remember them (I won't)  while my approach does not require remembering new commands. 
> > > >>>>> My approach only requires branching, push, and pulling on the fork and updating EdsRepo.py which is things all the developers know about and 
> > > >>>>> do regularly since we update other external packages.
> > > >>> 
> > > >>>>> Circular dependencies (PETSc CI depends on Ed's p4pdes depends on PETSc) are significantly more labor-intensive and it would need to be done on each change, versus once per release cycle.
> > > >>> 
> > > >>>    I think we are not communicating on exactly the  same wave length. 
> > > >>> 
> > > >>>    I am not advocating we test on Ed's I am advocating we test on our fork of Ed's and how often our fork gets passed to Ed with MR is a choice all of us make together, now we could do it once at release time with all the fixes we made, every month or each time. Totally up to Ed how often he wants to get them, and he is free to ignore them for weeks (up to the next release if he likes) since our testing will continue because we use our fork. How often we pass them on to Ed is not related to how we maintain our CI.
> > > >>> 
> > > >>>    There is no circular dependency on my approach with Ed's p4pdes.
> > > >> 
> > > >> The circular dependency is on our fork of p4pdes.
> > > >> 
> > > >>>    I vote to fix things in our fork or gittree thing continuously since it makes it easier to fix things rather than wait to the release when we try to find and fix everything and it also helps tell us if we introduced a real bug into PETSc and fix PETSc immediately instead of waiting up to 6 months, just like we now we test immediately with Petsc4py and we should do with SLEPc.  How often we give the updates to Ed is a completely different issue. 
> > > >>> 
> > > >>>   So again back to my original statement ,it comes down to if the subtree or the fork approach is easier for all the PETSc developers who do not currently know gittree and would need to learn it with your approach. I don't know which is easier learning to use gittree which has its own gotcha's or using mine which we all know but may require an extra step (not involving Ed, just updating the p4pdes.py commit each time we change something in the fork.)
> > > >>> 
> > > >>>  I think using --download-p4pdes on a couple of systems in the CI is enough, I don't think we need to put it all CI pipelines (I would like slepc in all pipelines).but we could put in all pipelines if we want.
> > > >>> 
> > > >>>  For completeness I show the exact the work flow for my suggestion
> > > >>> 
> > > >>>   pipelines --download-p4pdes and runs its tests
> > > >>>   if it breaks the developer uses --download-p4pdes  on their system 
> > > >>>     they fix the problem either by fixing PETSc or what is downloaded from the p4pde fork
> > > >>>        if the fix is in the p4pdes fork they make a branch in the p4pdes fork, which they already have since they used --download-p4pdes and thus 
> > > >>>                              have the fork on their system
> > > >>>            they put the fix in new branch in the p4pdes fork and push it
> > > >>>            they edit p4pdes.py and put a new commit in it pointing to their branch in the fork
> > > >>>   run the pipeline again
> > > >>>       if fails with p4pdes they do the above again
> > > >>>       else the PETSc branch gets accepted and merged to master
> > > >>>          depending on Ed's choice we make an MR for p4pdes depending on the agreed upon cycle. If Ed puts the fix into his master then we just update
> > > >>>               our fork with his latest master with a simple merge of his.  This will then become the new one we test against. If Ed doesn't respond to the MR all is fine we 
> > > >>>              just continue on our fork. If he puts other things in his branch but not our MR we just merge that into our fork and so are still testing with his latest master.
> > > >> 
> > > >> As you've enumerated here, this requires three MRs per change:
> > > >> 
> > > >>  1. in PETSc with the actual change and to point at a p4pdes commit
> > > >>  2. in p4pdes (Ed's or our fork) to implement needed changes (note: this can't merge until #1 merges)
> > > >>  3. in PETSc to point at the merge commit of p4pdes (after #2 merges)
> > > >> 
> > > >> With subtree, there is only one MR and it's in the PETSc repository.
> > > >> 
> > > >> Recall that we exchanged hundreds of emails to subtree in BuildSystem long ago, then quite a lot more to subtree in petsc4py recently, and now we're doing it again.
> > > >> 
> > > >> The arguments against subtreeing p4pdes are
> > > >> 
> > > >>  1. We want an extra hurdle so developers think twice before changing those interfaces
> > > >>  2. We want to drive traffic to Ed's repo
> > > >>  3. The repository is so big we don't want every `git clone gitlab:petsc/petsc` to include it
> > > >> 
> > > >> This repo is small so I'm most sympathetic to #2, but nobody has made these arguments yet.
> > > >> 
> > > >>>  This happens to be the exact same thing I do now with slepc and any other git based package that I use now (except for some I need to make my own fork of the external package because we don't have one always available eventually likely we will likely put more forks into gitlab/petsc so each person doesn't constantly need to make fresh forks of external packages)
> > > >>> 
> > > >>>  If subtree requires fewer steps than this and has no new oddball git subtree commands then we should definitely use subtree, if it requires other steps involving subtree we need see those commands written in a workflow like I have written above and decide if it is still simpler or not.
> > > >>> 
> > > >>>  I am not rejecting subtree, we just need to explicitly see its complete work flow and hence the differences to decide, that is what I asking for.
> > > >>> 
> > > >>>  Barry
> > > >>> 
> > > >>>  It seems to me that with subtree we also need to maintain a fork of Ed's stuff or else that will have a circular dependency. But perhaps I do not understand it.
> > > >> 
> > > >> Such a fork would only be used once per release to submit PRs to Ed.
> > > >> 
> > > > 
> > > 
> > 
> 
>