[petsc-dev] Thoughts on pushing current CI infrastructure to the next level
Balay, Satish
balay at mcs.anl.gov
Thu Apr 25 12:41:30 CDT 2019
On Thu, 25 Apr 2019, Karl Rupp via petsc-dev wrote:
> Dear PETSc developers,
>
> the current Jenkins server went live last summer. Since then, the stability of
> master and next has indeed improved. Who would have thought three years ago
> that `next` is almost as stable as `master`?
>
> However, over the weeks and months some weaknesses of our current continuous
> integration infrastructure became apparent:
>
> 1.) Still no Jenkins tests on Windows, because the remote execution of a Java
> application has some issues with Cygwin (which we require for PETSc).
[on discussions with Jed] - it appears that git-lab ci does not use
java. Also it mentioned 'ssh' in the list of 'executors' - so that
might work similar to our current windows setup.
>
> 2.) Jenkins workers every once in a while hang on the target machine (this has
> been independently observed in a different setting by Jed as well).
>
Yes - this is bad. So one criteria wrt choosing alternatives: how does
one debug problems with the CI tool.
> 3.) Nonscalability of the current setup: The Jenkins server clones a separate
> copy of the repository for each pull request and each test arch. Each clone of
> the PETSc repository is 300 MB, so if we aim at 40 different arches (i.e. the
> current coverage of the nightly tests) to test for each pull request, 300 MB *
> 40 = 12 GB of memory is required *for each pull request* on the Jenkins
> master.
If there is a way to manage how clones are used in the jenkins process
- I'm guessing this requirement can go down considerably wit local git
clones [which would use hard links to save space].
>
> 4.) Pull requests from external repositories in Bitbucket are currently tested
> by Jenkins, but the results are not visible on the pull requests page. This
> might be a Bitbucket issue rather than a Jenkins issue; and yet, it impedes
> our work flow.
I suspect this is because we don't have write access to the
forks. Previously all forks gave write access to the petsc
group. Don't know if that is setup somewhere - and can be modified [to
give write access from forks to jenkins]
Satish
> 5.) Adding additional workers requires significant configuration effort on the
> Jenkins master and is far from hassle-free. For example, it is currently
> impractical to add my office machine to the pool of workers, even though this
> machine is 99% idle.
>
> With some effort we can certainly address 1.) and to some extent 3.), probably
> 4.) as well, but I don't know how to solve 2.) and 5.) with Jenkins. Given
> that a significant effort is required for 1.), 3.) and 4.) anyway, I'm
> starting to get more and more comfortable with the idea of rolling our own CI
> infrastructure (which has been suggested in some of Barry's snarky remarks
> already ;-) ). Small Python scripts for executing the tests and pushing
> results to Bitbucket as well as a central result storage can replicate our
> existing setup with a few lines of codes, while being much more flexible.
>
> What do other PETSc developers think about CI infrastructure? Maybe
> suggestions other than Jenkins?
We would also have to think in terms of multiple levels of CI/testing.
For ex: we currently have some setup with travis-ci from github, and pipelines from bitbucket.
However we are not using them in the PullRequest workflow.
And then -the ECP ci resources -if we are to utilize - would also not
be useable in the PR workflow - but might be useful for performance regression
testing and large scale testing [if we can setup the test suite for it]. And
this appears to be via git-lab ci
Satish
More information about the petsc-dev
mailing list