[petsc-dev] PetscCUDAInitialize

Smith, Barry F. bsmith at mcs.anl.gov
Thu Sep 19 15:57:51 CDT 2019


 Failed?  Means nothing, send link or cut and paste error

 It could be that since we have multiple separate tests running at the same time they overload the GPU or cause some inconsistent behavior that doesn't appear every time the tests are run.

   Barry

Maybe we need to sequentialize all the tests that use the GPUs, we just trust gnumake for the parallelism maybe you could some how add dependencies to get gnu make to achieve this?


 

> On Sep 19, 2019, at 3:53 PM, Zhang, Junchao <jczhang at mcs.anl.gov> wrote:
> 
> On Thu, Sep 19, 2019 at 3:24 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
> 
> > On Sep 19, 2019, at 2:50 PM, Zhang, Junchao <jczhang at mcs.anl.gov> wrote:
> > 
> > I saw your update. In PetscCUDAInitialize we have
> > 
> >     
> > 
> > 
> > 
> >       /* First get the device count */
> > 
> >       err   = cudaGetDeviceCount(&devCount);
> > 
> > 
> > 
> > 
> >       /* next determine the rank and then set the device via a mod */
> > 
> >       ierr   = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);
> > 
> >       device = rank % devCount;
> > 
> >     }
> > 
> >     err = cudaSetDevice(device);
> > 
> > 
> > 
> > 
> > 
> > If we rely on the first CUDA call to do initialization, how could CUDA know these MPI stuff.
> 
>   It doesn't, so it does whatever it does (which may be dumb).
> 
>   Are you proposing something?
> 
> No. My test failed in CI with -cuda_initialize 0 on frog but I could not reproduce it. I'm doing investigation. 
> 
>   Barry
> 
> > 
> > --Junchao Zhang
> > 
> > 
> > 
> > On Wed, Sep 18, 2019 at 11:42 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > 
> >   Fixed the docs. Thanks for pointing out the lack of clarity
> > 
> > 
> > > On Sep 18, 2019, at 11:25 PM, Zhang, Junchao via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> > > 
> > > Barry,
> > > 
> > > I saw you added these in init.c
> > > 
> > > 
> > > +  -cuda_initialize - do the initialization in PetscInitialize()
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Notes:
> > > 
> > >    Initializing cuBLAS takes about 1/2 second there it is done by default in PetscInitialize() before logging begins
> > > 
> > > 
> > > 
> > > But I did not get otherwise with -cuda_initialize 0, when will cuda be initialized?
> > > --Junchao Zhang
> > 



More information about the petsc-dev mailing list