[petsc-dev] Improving nightly builds for maint

Jed Brown jed at jedbrown.org
Sat Oct 21 11:32:41 CDT 2017


Satish Balay <balay at mcs.anl.gov> writes:

> On Sat, 21 Oct 2017, Lisandro Dalcin wrote:
>
>> Satish set TIMEOUT to more than 2 hours. If a test ever fails because
>> of a deadlock, the build worker will be stuck for 2 hours. Of course,
>> we will likely notice, but still...
>
> As mentioned timeout doesn't reall work with valgrind builds [ perhaps
> also with openmpi builds and with any mpi impl that doesn't kill child
> process when mpiexec proc is killed etc.] - so short timeout is just
> printing incorrect-verbose messages [i.e a kill message is printed -
> but the job isn't getting killed]. - a long/infinite timeout is just
> the representation of curent runtime behavior.
>
> In the future - if all tests are converted to test harness - a few
> long jobs won't be a big issue wrt throughput. [as multiple jobs get
> run simultaneously]

Depends if you're using the machine for other things.  I think having
that long-running job would tend to oversubscribe MPI and slow down
throughput.  In any case, we probably shouldn't run this 3D convergence
study under any testing system, even without Valgrind.

> Sure its best to not have long running test jobs - if possible.
>
> Satish


More information about the petsc-dev mailing list