[petsc-dev] MatMPIAIJGetLocalMat problem with GPUs
Mark Adams
mfadams at lbl.gov
Fri Jun 24 07:58:35 CDT 2022
And before we move to the MR, I think Matt found a clear problem:
* PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
returns "localA seqaij"
* And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA,
&amgx->values)); returns:
"it seems to detect that the pointer is a device mapped pointer but that it
is invalid"
Matt, lets just comment out the REUSE line and add another INITIAL line
(destroying the old Mat of course), and lets press on.
We can keep the debugging code for now.
We (PETSc) can work on this independently,
Thanks,
Mark
On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <mfadams at lbl.gov> wrote:
> I am not seeing this response, I see my "hstack" comment last.
> https://gitlab.com/petsc/petsc/-/merge_requests/4323
>
> On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>> I have responded in the MR, which has all the context and the code.
>> Please move this conversation from petsc-dev to the MR. Note you can use
>> the little cartoon cloud symbol (upper write of the sub window with my
>> text) to reply to my post and keep everything in a thread for clarity.
>>
>> We are confused because it seems you are trying a variety of things and
>> we don't know how the different things you tried resulted in the multiple
>> errors you reported.
>>
>>
>>
>> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <mmartineau at nvidia.com>
>> wrote:
>>
>> I checked in the changes and some debugging statements.
>>
>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA,
>> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, ""));
>>
>> Then the call returns false. If we instead call PetscObjectTypeCompareAny
>> on Pmat then it returns true. If you print the type of the matrices:
>>
>>
>> localA seqaij
>>
>> Pmat mpiaijcusparse
>>
>> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it
>> errors (presumably because of the type mismatch).
>>
>> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to
>> AmgX it seems to detect that the pointer is a device mapped pointer but
>> that it is invalid.
>>
>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems
>> to return invalid pointer, but I’ll investigate more
>>
>> This doesn’t reproduce if we call:
>>
>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA));
>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Pointer
>> appears to be valid and we converge
>>
>> Essentially all I want to achieve is that when we are parallel, we fetch
>> the local part of A and the device pointer to the matrix values from that
>> structure so that we can pass to AmgX. Preferring whichever API calls are
>> the most efficient.
>>
>>
>> *From:* Stefano Zampini <stefano.zampini at gmail.com>
>> *Sent:* 23 June 2022 20:55
>> *To:* Mark Adams <mfadams at lbl.gov>
>> *Cc:* Barry Smith <bsmith at petsc.dev>; For users of the development
>> version of PETSc <petsc-dev at mcs.anl.gov>; Matthew Martineau <
>> mmartineau at nvidia.com>
>> *Subject:* Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs
>>
>> *External email: Use caution opening links or attachments*
>>
>> The logic is wrong. It should check for MATSEQAIJCUSPARSE.
>>
>> On Thu, Jun 23, 2022, 21:36 Mark Adams <mfadams at lbl.gov> wrote:
>>
>>
>>
>> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>
>> It looks like the current code copies the nonzero values to the CPU
>> from the MPI matrix (with the calls
>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav));
>> PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them
>> into the CPU memory of the Seq matrix. When the matrix entries are next
>> accessed on the GPU it should automatically copy them down to the GPU. So
>> the code looks ok even for GPUs. We'll need to see the full error message
>> with what the "invalid pointer" is.
>>
>>
>> I showed Matt how to peek into offloadmask and he found that it is a host
>> state, but this is not the issue. The access method should do the copy to
>> the device.
>>
>> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT"
>> in the comparison below).
>>
>> Matt, is the issue that you are calling *MatSeqAIJCUSPARSEGetArrayRead* and
>> getting a host pointer?
>>
>> I think the state of amgx->localA after the call to
>> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data
>> to the device so they are both valid and you should have device data.
>>
>> 211 PetscBool is_dev_ptrs;
>> 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA,
>> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, ""));
>> 213
>> 214 if (is_dev_ptrs) {
>>
>> *216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA,
>> &amgx->values));*217 } else {
>> 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values));
>> 220 }
>>
>>
>>
>> Barry
>>
>>
>> Yes this routine is terribly inefficient for GPU matrices, it needs to
>> be specialized to not use the GPU memory but that is a separate issue from
>> there being bugs in the current code.
>>
>> The code also seems to implicitly assume the parallel matrix has the
>> same nonzero pattern with a reuse. This should be checked with each use by
>> stashing the nonzero state of the matrix into the sequential matrix and
>> making sure the parallel matrix has that same stashed value each time.
>> Currently if one changes the nonzero matrix of the parallel matrix one is
>> likely to get random confusing crashes due to memory corruption. But likely
>> not the problem here.
>>
>>
>> On Jun 23, 2022, at 2:23 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel.
>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass
>> in the code below, where the local matrix is created (INITIAL), but in the
>> next pass, when "REUSE" is used, he sees an invalid pointer.
>> Matt found that it does have offloadmask == CPU.
>> Maybe it is missing logic to put the output in same state as the input?
>>
>> Any ideas on this or should I just dig into it?
>>
>> Thanks,
>>
>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != DIFFERENT_NONZERO_PATTERN);
>>
>> 199 if (amgx->nranks > 1) {
>>
>> 200 if (partial_setup_allowed) {
>>
>> 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); // This path seems doesn't work by the time we reach AmgX API
>>
>> 203 } else {
>>
>> 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); // This path works
>>
>> 206 }
>>
>> 207 } else {
>>
>> 208 amgx->localA = Pmat;
>>
>> 209 }
>>
>> 210
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220624/5d65978e/attachment-0001.html>
More information about the petsc-dev
mailing list