[petsc-users] HashMap Error when populating AIJCUSPARSE matrix

Yesypenko, Anna anna at oden.utexas.edu
Thu Jan 18 15:18:48 CST 2024


Hi Matt, Barry,

Apologies for the extra dependency on scipy. I can replicate the error by calling setValue (i,j,v) in a loop as well.
In roughly half of 10 runs, the following script fails because of an error in hashmapijv – the same as my original post.
It successfully runs without error the other times.

Barry is right that it's CUDA specific. The script runs fine on the CPU.
Do you have any suggestions or example scripts on assigning entries to a AIJCUSPARSE matrix?

Here is a minimum snippet that doesn't depend on scipy.
```
from petsc4py import PETSc
import numpy as np

n = int(5e5);
nnz = 3 * np.ones(n, dtype=np.int32)
nnz[0] = nnz[-1] = 2
A = PETSc.Mat(comm=PETSc.COMM_WORLD)
A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz)
A.setType('aijcusparse')

A.setValue(0, 0, 2)
A.setValue(0, 1, -1)
A.setValue(n-1, n-2, -1)
A.setValue(n-1, n-1, 2)

for index in range(1, n - 1):
         A.setValue(index, index - 1, -1)
         A.setValue(index, index, 2)
         A.setValue(index, index + 1, -1)
A.assemble()
```
If it means anything to you, when the hash error occurs, it is for index 67283 after filling 201851 nonzero values.

Thank you for your help and suggestions!
Anna

________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, January 18, 2024 2:35 PM
To: Yesypenko, Anna <anna at oden.utexas.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] HashMap Error when populating AIJCUSPARSE matrix


   Do you ever get a problem with 'aij` ?   Can you run in a loop with 'aij' to confirm it doesn't fail then?



   Barry


On Jan 17, 2024, at 4:51 PM, Yesypenko, Anna <anna at oden.utexas.edu> wrote:

Dear Petsc users/developers,

I'm experiencing a bug when using petsc4py with GPU support. It may be my mistake in how I set up a AIJCUSPARSE matrix.
For larger matrices, I sometimes encounter a error in assigning matrix values; the error is thrown in PetscHMapIJVQuerySet().
Here is a minimum snippet that populates a sparse tridiagonal matrix.

```
from petsc4py import PETSc
from scipy.sparse import diags
import numpy as np

n = int(5e5);

nnz = 3 * np.ones(n, dtype=np.int32); nnz[0] = nnz[-1] = 2
A = PETSc.Mat(comm=PETSc.COMM_WORLD)
A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz)
A.setType('aijcusparse')
tmp = diags([-1,2,-1],[-1,0,+1],shape=(n,n)).tocsr()
A.setValuesCSR(tmp.indptr,tmp.indices,tmp.data)                            ####### this is the line where the error is thrown.
A.assemble()
```

The error trace is below:
```
File "petsc4py/PETSc/Mat.pyx", line 2603, in petsc4py.PETSc.Mat.setValuesCSR
  File "petsc4py/PETSc/petscmat.pxi", line 1039, in petsc4py.PETSc.matsetvalues_csr
  File "petsc4py/PETSc/petscmat.pxi", line 1032, in petsc4py.PETSc.matsetvalues_ijv
petsc4py.PETSc.Error: error code 76
[0] MatSetValues() at /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:1497
[0] MatSetValues_Seq_Hash() at /work/06368/annayesy/ls6/petsc/include/../src/mat/impls/aij/seq/seqhashmatsetvalues.h:52
[0] PetscHMapIJVQuerySet() at /work/06368/annayesy/ls6/petsc/include/petsc/private/hashmapijv.h:10
[0] Error in external library
[0] [khash] Assertion: `ret >= 0' failed.
```

If I run the same script a handful of times, it will run without errors eventually.
Does anyone have insight on why it is behaving this way? I'm running on a node with 3x NVIDIA A100 PCIE 40GB.

Thank you!
Anna

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240118/c6b917c9/attachment-0001.html>


More information about the petsc-users mailing list