<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div dir="ltr">
<div>All failed tests just said "application called MPI_Abort" and had no stack trace. They are not cuda tests. I updated SF to avoid CUDA related initialization if not needed. Let's see the new test result.
<pre style="color:rgb(0,0,0);white-space:pre-wrap">not ok dm_impls_stag_tests-ex13_none_none_none_3d_par_stag_stencil_width-1
# application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1</pre>
</div>
<div><br>
</div>
<div>
<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Sep 19, 2019 at 3:57 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Failed? Means nothing, send link or cut and paste error<br>
<br>
It could be that since we have multiple separate tests running at the same time they overload the GPU or cause some inconsistent behavior that doesn't appear every time the tests are run.<br>
<br>
Barry<br>
<br>
Maybe we need to sequentialize all the tests that use the GPUs, we just trust gnumake for the parallelism maybe you could some how add dependencies to get gnu make to achieve this?<br>
<br>
<br>
<br>
<br>
> On Sep 19, 2019, at 3:53 PM, Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>> wrote:<br>
> <br>
> On Thu, Sep 19, 2019 at 3:24 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
> <br>
> <br>
> > On Sep 19, 2019, at 2:50 PM, Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>> wrote:<br>
> > <br>
> > I saw your update. In PetscCUDAInitialize we have<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > /* First get the device count */<br>
> > <br>
> > err = cudaGetDeviceCount(&devCount);<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > /* next determine the rank and then set the device via a mod */<br>
> > <br>
> > ierr = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);<br>
> > <br>
> > device = rank % devCount;<br>
> > <br>
> > }<br>
> > <br>
> > err = cudaSetDevice(device);<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > If we rely on the first CUDA call to do initialization, how could CUDA know these MPI stuff.<br>
> <br>
> It doesn't, so it does whatever it does (which may be dumb).<br>
> <br>
> Are you proposing something?<br>
> <br>
> No. My test failed in CI with -cuda_initialize 0 on frog but I could not reproduce it. I'm doing investigation.
<br>
> <br>
> Barry<br>
> <br>
> > <br>
> > --Junchao Zhang<br>
> > <br>
> > <br>
> > <br>
> > On Wed, Sep 18, 2019 at 11:42 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
> > <br>
> > Fixed the docs. Thanks for pointing out the lack of clarity<br>
> > <br>
> > <br>
> > > On Sep 18, 2019, at 11:25 PM, Zhang, Junchao via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:<br>
> > > <br>
> > > Barry,<br>
> > > <br>
> > > I saw you added these in init.c<br>
> > > <br>
> > > <br>
> > > + -cuda_initialize - do the initialization in PetscInitialize()<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > Notes:<br>
> > > <br>
> > > Initializing cuBLAS takes about 1/2 second there it is done by default in PetscInitialize() before logging begins<br>
> > > <br>
> > > <br>
> > > <br>
> > > But I did not get otherwise with -cuda_initialize 0, when will cuda be initialized?<br>
> > > --Junchao Zhang<br>
> > <br>
<br>
</blockquote>
</div>
</body>
</html>