[petsc-users] HashMap Error when populating AIJCUSPARSE matrix

Barry Smith bsmith at petsc.dev
Thu Jan 18 15:12:08 CST 2024


   It appears to be crashing in kh_resize() in khash.h on a memory allocation failure when it tries to get additional memory for storing the matrix.

   This code seems to be only using the CPU memory so it should also fail in a similar way with 'aij'.   

  But the matrix is not large and so I don't think it should be running out of memory. I cannot reproduce the crash with same parameters on my non-CUDA machine so debugging will be tricky.

   Barry






> On Jan 18, 2024, at 3:35 PM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>    Do you ever get a problem with 'aij` ?   Can you run in a loop with 'aij' to confirm it doesn't fail then?
> 
>    
> 
>    Barry
> 
> 
>> On Jan 17, 2024, at 4:51 PM, Yesypenko, Anna <anna at oden.utexas.edu> wrote:
>> 
>> Dear Petsc users/developers,
>> 
>> I'm experiencing a bug when using petsc4py with GPU support. It may be my mistake in how I set up a AIJCUSPARSE matrix.
>> For larger matrices, I sometimes encounter a error in assigning matrix values; the error is thrown in PetscHMapIJVQuerySet().
>> Here is a minimum snippet that populates a sparse tridiagonal matrix. 
>> 
>> ```
>> from petsc4py import PETSc
>> from scipy.sparse import diags
>> import numpy as np
>> 
>> n = int(5e5); 
>> 
>> nnz = 3 * np.ones(n, dtype=np.int32); nnz[0] = nnz[-1] = 2
>> A = PETSc.Mat(comm=PETSc.COMM_WORLD)
>> A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz)
>> A.setType('aijcusparse')
>> tmp = diags([-1,2,-1],[-1,0,+1],shape=(n,n)).tocsr()
>> A.setValuesCSR(tmp.indptr,tmp.indices,tmp.data)                            ####### this is the line where the error is thrown.
>> A.assemble()
>> ```
>> 
>> The error trace is below:
>> ```
>> File "petsc4py/PETSc/Mat.pyx", line 2603, in petsc4py.PETSc.Mat.setValuesCSR
>>   File "petsc4py/PETSc/petscmat.pxi", line 1039, in petsc4py.PETSc.matsetvalues_csr
>>   File "petsc4py/PETSc/petscmat.pxi", line 1032, in petsc4py.PETSc.matsetvalues_ijv
>> petsc4py.PETSc.Error: error code 76
>> [0] MatSetValues() at /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:1497
>> [0] MatSetValues_Seq_Hash() at /work/06368/annayesy/ls6/petsc/include/../src/mat/impls/aij/seq/seqhashmatsetvalues.h:52
>> [0] PetscHMapIJVQuerySet() at /work/06368/annayesy/ls6/petsc/include/petsc/private/hashmapijv.h:10
>> [0] Error in external library
>> [0] [khash] Assertion: `ret >= 0' failed.
>> ```
>> 
>> If I run the same script a handful of times, it will run without errors eventually.
>> Does anyone have insight on why it is behaving this way? I'm running on a node with 3x NVIDIA A100 PCIE 40GB.
>> 
>> Thank you!
>> Anna
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240118/e5420c71/attachment.html>


More information about the petsc-users mailing list