[petsc-dev] Thoughts on pushing current CI infrastructure to the next level

Thu Apr 25 12:41:30 CDT 2019

On Thu, 25 Apr 2019, Karl Rupp via petsc-dev wrote:

> Dear PETSc developers,
> 
> the current Jenkins server went live last summer. Since then, the stability of
> master and next has indeed improved. Who would have thought three years ago
> that `next` is almost as stable as `master`?
> 
> However, over the weeks and months some weaknesses of our current continuous
> integration infrastructure became apparent:
> 
> 1.) Still no Jenkins tests on Windows, because the remote execution of a Java
> application has some issues with Cygwin (which we require for PETSc).

[on discussions with Jed] - it appears that git-lab ci does not use
java. Also it mentioned 'ssh' in the list of 'executors' - so that
might work similar to our current windows setup.

> 
> 2.) Jenkins workers every once in a while hang on the target machine (this has
> been independently observed in a different setting by Jed as well).
> 

Yes - this is bad. So one criteria wrt choosing alternatives: how does
one debug problems with the CI tool.

> 3.) Nonscalability of the current setup: The Jenkins server clones a separate
> copy of the repository for each pull request and each test arch. Each clone of
> the PETSc repository is 300 MB, so if we aim at 40 different arches (i.e. the
> current coverage of the nightly tests) to test for each pull request, 300 MB *
> 40 = 12 GB of memory is required *for each pull request* on the Jenkins
> master.

If there is a way to manage how clones are used in the jenkins process
- I'm guessing this requirement can go down considerably wit local git
clones [which would use hard links to save space].

> 
> 4.) Pull requests from external repositories in Bitbucket are currently tested
> by Jenkins, but the results are not visible on the pull requests page. This
> might be a Bitbucket issue rather than a Jenkins issue; and yet, it impedes
> our work flow.

I suspect this is because we don't have write access to the
forks. Previously all forks gave write access to the petsc
group. Don't know if that is setup somewhere - and can be modified [to
give write access from forks to jenkins]

Satish

> 5.) Adding additional workers requires significant configuration effort on the
> Jenkins master and is far from hassle-free. For example, it is currently
> impractical to add my office machine to the pool of workers, even though this
> machine is 99% idle.
> 
> With some effort we can certainly address 1.) and to some extent 3.), probably
> 4.) as well, but I don't know how to solve 2.) and 5.) with Jenkins. Given
> that a significant effort is required for 1.), 3.) and 4.) anyway, I'm
> starting to get more and more comfortable with the idea of rolling our own CI
> infrastructure (which has been suggested in some of Barry's snarky remarks
> already ;-) ). Small Python scripts for executing the tests and pushing
> results to Bitbucket as well as a central result storage can replicate our
> existing setup with a few lines of codes, while being much more flexible.
> 
> What do other PETSc developers think about CI infrastructure? Maybe
> suggestions other than Jenkins?

We would also have to think in terms of multiple levels of CI/testing.

For ex: we currently have some setup with travis-ci from github, and pipelines from bitbucket.

However we are not using them in the PullRequest workflow.

And then -the ECP ci resources -if we are to utilize - would also not
be useable in the PR workflow - but might be useful for performance regression
testing and large scale testing [if we can setup the test suite for it]. And
this appears to be via git-lab ci

Satish