[petsc-users] HashMap Error when populating AIJCUSPARSE matrix

Yesypenko, Anna anna at oden.utexas.edu
Thu Jan 18 17:09:44 CST 2024


Hi Barry,

I'm using version 3.20.3. The tacc system is lonestar6.

Best,
Anna
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, January 18, 2024 4:43 PM
To: Yesypenko, Anna <anna at oden.utexas.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Victor Eijkhout <eijkhout at tacc.utexas.edu>
Subject: Re: [petsc-users] HashMap Error when populating AIJCUSPARSE matrix


   Ok, I ran it on an ANL machine with CUDA and it worked fine for many runs, even increased the problem size without producing any problems. Both versions of the Python code.

   Anna,

   What version of PETSc are you using?

   Victor,

   Does anyone at ANL have access to this TACC system to try to reproduce?


  Barry



On Jan 18, 2024, at 4:38 PM, Barry Smith <bsmith at petsc.dev> wrote:


   It is using the hash map system for inserting values which only inserts on the CPU, not on the GPU. So I don't see that it would be moving any data to the GPU until the mat assembly() is done which it never gets to. Hence I have trouble understanding why the GPU has anything to do with the crash.

   I guess I need to try to reproduce it on a GPU system.

   Barry




On Jan 18, 2024, at 4:28 PM, Matthew Knepley <knepley at gmail.com> wrote:

On Thu, Jan 18, 2024 at 4:18 PM Yesypenko, Anna <anna at oden.utexas.edu<mailto:anna at oden.utexas.edu>> wrote:
Hi Matt, Barry,

Apologies for the extra dependency on scipy. I can replicate the error by calling setValue (i,j,v) in a loop as well.
In roughly half of 10 runs, the following script fails because of an error in hashmapijv – the same as my original post.
It successfully runs without error the other times.

Barry is right that it's CUDA specific. The script runs fine on the CPU.
Do you have any suggestions or example scripts on assigning entries to a AIJCUSPARSE matrix?

Oh, you definitely do not want to be doing this. I believe you would rather

1) Make the CPU matrix and then convert to AIJCUSPARSE. This is efficient.

2) Produce the values on the GPU and call

  https://petsc.org/main/manualpages/Mat/MatSetPreallocationCOO/
  https://petsc.org/main/manualpages/Mat/MatSetValuesCOO/

  This is what most people do who are forming matrices directly on the GPU.

What you are currently doing is incredibly inefficient, and I think accounts for you running out of memory.
It talks back and forth between the CPU and GPU.

  Thanks,

     Matt

Here is a minimum snippet that doesn't depend on scipy.
```
from petsc4py import PETSc
import numpy as np

n = int(5e5);
nnz = 3 * np.ones(n, dtype=np.int32)
nnz[0] = nnz[-1] = 2
A = PETSc.Mat(comm=PETSc.COMM_WORLD)
A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz)
A.setType('aijcusparse')

A.setValue(0, 0, 2)
A.setValue(0, 1, -1)
A.setValue(n-1, n-2, -1)
A.setValue(n-1, n-1, 2)

for index in range(1, n - 1):
         A.setValue(index, index - 1, -1)
         A.setValue(index, index, 2)
         A.setValue(index, index + 1, -1)
A.assemble()
```
If it means anything to you, when the hash error occurs, it is for index 67283 after filling 201851 nonzero values.

Thank you for your help and suggestions!
Anna

________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, January 18, 2024 2:35 PM
To: Yesypenko, Anna <anna at oden.utexas.edu<mailto:anna at oden.utexas.edu>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] HashMap Error when populating AIJCUSPARSE matrix


   Do you ever get a problem with 'aij` ?   Can you run in a loop with 'aij' to confirm it doesn't fail then?



   Barry


On Jan 17, 2024, at 4:51 PM, Yesypenko, Anna <anna at oden.utexas.edu<mailto:anna at oden.utexas.edu>> wrote:

Dear Petsc users/developers,

I'm experiencing a bug when using petsc4py with GPU support. It may be my mistake in how I set up a AIJCUSPARSE matrix.
For larger matrices, I sometimes encounter a error in assigning matrix values; the error is thrown in PetscHMapIJVQuerySet().
Here is a minimum snippet that populates a sparse tridiagonal matrix.

```
from petsc4py import PETSc
from scipy.sparse import diags
import numpy as np

n = int(5e5);

nnz = 3 * np.ones(n, dtype=np.int32); nnz[0] = nnz[-1] = 2
A = PETSc.Mat(comm=PETSc.COMM_WORLD)
A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz)
A.setType('aijcusparse')
tmp = diags([-1,2,-1],[-1,0,+1],shape=(n,n)).tocsr()
A.setValuesCSR(tmp.indptr,tmp.indices,tmp.data)                            ####### this is the line where the error is thrown.
A.assemble()
```

The error trace is below:
```
File "petsc4py/PETSc/Mat.pyx", line 2603, in petsc4py.PETSc.Mat.setValuesCSR
  File "petsc4py/PETSc/petscmat.pxi", line 1039, in petsc4py.PETSc.matsetvalues_csr
  File "petsc4py/PETSc/petscmat.pxi", line 1032, in petsc4py.PETSc.matsetvalues_ijv
petsc4py.PETSc.Error: error code 76
[0] MatSetValues() at /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:1497
[0] MatSetValues_Seq_Hash() at /work/06368/annayesy/ls6/petsc/include/../src/mat/impls/aij/seq/seqhashmatsetvalues.h:52
[0] PetscHMapIJVQuerySet() at /work/06368/annayesy/ls6/petsc/include/petsc/private/hashmapijv.h:10
[0] Error in external library
[0] [khash] Assertion: `ret >= 0' failed.
```

If I run the same script a handful of times, it will run without errors eventually.
Does anyone have insight on why it is behaving this way? I'm running on a node with 3x NVIDIA A100 PCIE 40GB.

Thank you!
Anna



--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240118/6f3f40a7/attachment-0001.html>


More information about the petsc-users mailing list