[petsc-users] Question on usage of PetscMalloc(Re)SetCUDAHost

Barry Smith bsmith at petsc.dev
Tue Aug 25 18:59:35 CDT 2020


PetscMallocSetCUDAHost() switches from using the regular malloc on the CPU to using cudaMallocHost() it also switches the free.These means between the PetscMallocSetCUDAHost() and the PetscMallocResetCUDAHost() all mallocs are done with cudaHost version and so are all frees.

If any memory that was allocated before the call to PetscMallocSetCUDAHost() so it was allocated with a regular malloc is freed inside the block it will be freed with the incorrect cudaHostFree and will crash. This makes these routines very fragile 

I don't understand the purpose of PetscMallocSetCUDAHost(), possibly it is intended to be used with Nvidia unified memory so the same addresses can be used on the GPU.  PETSc does not use or need unified memory in its programming model for GPUs. 

As far as I am aware you don't have any reason to use these routines.

  Barry



> On Aug 25, 2020, at 6:46 PM, Sajid Ali <sajidsyed2021 at u.northwestern.edu> wrote:
> 
> Hi PETSc-developers,
> 
> Is it valid to allocate matrix values on host for use on a GPU later by embedding all allocation logic (i.e the code block that calls PetscMalloc1 for values and indices and sets them using MatSetValues) within a section marked by PetscMalloc(Re)SetCUDAHost ?
> 
> My understanding was that PetscMallocSetCUDAHost would set mallocs to be on the host but I’m getting an error as shown below (for some strange reason it happens to be the 5th column on the 0th row (if that helps) both when setting one value at a time and when setting the whole 0th row together):
> 
> [sajid at xrmlite cuda]$ mpirun -np 1 ~/packages/pirt/src/pirt -inputfile shepplogan.h5
> PIRT -- Parallel Iterative Reconstruction Tomography
> Reading in real data from shepplogan.h5
> After loading data, nTau:100, nTheta:50
> After detector geometry context initialization
> Initialized PIRT
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: cuda error 1 (cudaErrorInvalidValue) : invalid argument
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.2-947-gc2372adeb2  GIT Date: 2020-08-25 21:07:25 +0000
> [0]PETSC ERROR: /home/sajid/packages/pirt/src/pirt on a arch-linux-c-debug named xrmlite by sajid Tue Aug 25 18:30:55 2020
> [0]PETSC ERROR: Configure options --with-hdf5=1 --with-cuda=1
> [0]PETSC ERROR: #1 PetscCUDAHostFree() line 14 in /home/sajid/packages/petsc/src/sys/memory/cuda/mcudahost.cu <http://mcudahost.cu/>
> [0]PETSC ERROR: #2 PetscFreeA() line 475 in /home/sajid/packages/petsc/src/sys/memory/mal.c
> [0]PETSC ERROR: #3 MatSeqXAIJFreeAIJ() line 135 in /home/sajid/packages/petsc/include/../src/mat/impls/aij/seq/aij.h
> [0]PETSC ERROR: #4 MatSetValues_SeqAIJ() line 498 in /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: #5 MatSetValues() line 1392 in /home/sajid/packages/petsc/src/mat/interface/matrix.c
> [0]PETSC ERROR: #6 setMatrixElements() line 248 in /home/sajid/packages/pirt/src/geom.cxx
> [0]PETSC ERROR: #7 construct_matrix() line 91 in /home/sajid/packages/pirt/src/matrix.cu <http://matrix.cu/>
> [0]PETSC ERROR: #8 main() line 20 in /home/sajid/packages/pirt/src/pirt.cxx
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -inputfile shepplogan.h5
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> with errorcode 20076.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> [sajid at xrmlite cuda]$
> PetscCUDAHostFree is called within the PetscMalloc(Re)SetCUDAHost block as described earlier which should’ve created valid memory on the host.
> 
> Could someone explain if this is the correct approach to take and what the above error means ?
> 
> (PS : I’ve run ksp tutorial-ex2 with -vec_type cuda -mat_type aijcusparse to test the installation and everything works as expected.)
> 
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io <http://s-sajid-ali.github.io/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200825/17ef41bd/attachment.html>


More information about the petsc-users mailing list