[petsc-dev] upcoming release and testing [v2]

Satish Balay balay at mcs.anl.gov
Fri Apr 6 16:48:27 CDT 2018


The tests are run in parallel - but most other tests are sequential.

When I tried to reproduce I ran sequentially [as in 'make -f gmakefile' without -j] - and some of them hung..

Satish

On Fri, 6 Apr 2018, Karl Rupp wrote:

> Hi again,
> 
> the reason for the higher number of timeouts is likely to be due to the higher
> number of GPU tests. GPU builds that formerly only used CUSP now run against
> the CUDA backend, which has a higher number of tests. Also, the CUDA backend
> uses CUBLAS and CUSPARSE, whereas CUSP used its own kernels. As far as I know,
> CUBLAS and CUSPARSE initialization is fairly slow on the M2090.
> 
> Best regards,
> Karli
> 
> 
> On 04/06/2018 09:13 PM, Karl Rupp wrote:
> > Hi,
> > 
> >> The CUDA tests are hanging/timing-out more often now. For eg:
> >> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/06/examples_next_arch-cuda-double_es.log 
> >>
> >>
> >> And I did see some build where they didn't get killed due to timeout. For
> >> eg:
> >> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/05/examples_next_arch-cuda-double_es.log 
> >>
> >>
> >> This is on M2090.  I can see them getting stuck on es.mcs [when I run
> >> manually - and check with nvidia-smi]
> >>
> >> When i run these tests manually on GTX1050 (frog.mcs) - they zip through..
> >> Any idea why they get stuck on M2090? [more frequently than random hangs..]
> > 
> > no, I don't know why this is the case. All my local tests finish quickly,
> > too. I noticed last summer that there is higher startup overhead on the
> > M2090 than on more recent GPUs, but that was in the seconds regime, not in
> > minutes.
> > 
> > Are the tests run in parallel? If so, then maybe the parallel initialization
> > of GPUs is slowing things down.
> > 
> > Best regards,
> > Karli
> 


More information about the petsc-dev mailing list