[petsc-dev] upcoming release and testing [v2]

Fri Apr 6 15:26:40 CDT 2018

Hi again,

the reason for the higher number of timeouts is likely to be due to the 
higher number of GPU tests. GPU builds that formerly only used CUSP now 
run against the CUDA backend, which has a higher number of tests. Also, 
the CUDA backend uses CUBLAS and CUSPARSE, whereas CUSP used its own 
kernels. As far as I know, CUBLAS and CUSPARSE initialization is fairly 
slow on the M2090.

Best regards,
Karli

On 04/06/2018 09:13 PM, Karl Rupp wrote:
> Hi,
> 
>> The CUDA tests are hanging/timing-out more often now. For eg:
>> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/06/examples_next_arch-cuda-double_es.log 
>>
>>
>> And I did see some build where they didn't get killed due to timeout. 
>> For eg:
>> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/05/examples_next_arch-cuda-double_es.log 
>>
>>
>> This is on M2090.  I can see them getting stuck on es.mcs [when I run 
>> manually - and check with nvidia-smi]
>>
>> When i run these tests manually on GTX1050 (frog.mcs) - they zip 
>> through..
>> Any idea why they get stuck on M2090? [more frequently than random 
>> hangs..]
> 
> no, I don't know why this is the case. All my local tests finish 
> quickly, too. I noticed last summer that there is higher startup 
> overhead on the M2090 than on more recent GPUs, but that was in the 
> seconds regime, not in minutes.
> 
> Are the tests run in parallel? If so, then maybe the parallel 
> initialization of GPUs is slowing things down.
> 
> Best regards,
> Karli