[petsc-dev] MATOP_MAT_MULT

Stefano Zampini stefano.zampini at gmail.com
Fri May 8 10:14:21 CDT 2020


Jose

Just send me a MWE and I’ll fix the case for you

Thanks
Stefano

> On May 8, 2020, at 6:13 PM, Jose E. Roman <jroman at dsic.upv.es> wrote:
> 
> Stefano, I have tried to make my code work with your code on GPU (branch jose/bv-matmult-fallback), but I have errors.
> 
> This is what I do on CPU:
> 
> if (create) {
>  ierr = MatCreateDense(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,vv,&bv->Aget);CHKERRQ(ierr); /* pass a pointer to avoid allocation of storage */
>  ierr = MatDensePlaceArray(bv->Aget,NULL);CHKERRQ(ierr);  /* replace with a null pointer, the value after BVRestoreMat */
> }
> ierr = MatDensePlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr);  /* set the actual pointer */
> 
> The analogue on GPU:
> 
> if (create) {
>  ierr = MatCreateDenseCUDA(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,vv,&bv->Aget);CHKERRQ(ierr); /* pass a pointer to avoid allocation of storage */
>  ierr = MatDenseCUDAPlaceArray(bv->Aget,NULL);CHKERRQ(ierr);  /* replace with a null pointer, the value after BVRestoreMat */
> }
> ierr = MatDenseCUDAPlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr);  /* set the actual pointer */
> 
> But it does not work:
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Operation done in wrong order
> [0]PETSC ERROR: MatDenseCUDAResetArray() must be called first
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.1-187-gd15b076f40  GIT Date: 2020-05-08 11:20:42 +0300
> [0]PETSC ERROR: ./ex19 on a arch-gpu2-intel-c-debug-cuda named gpu2 by jroman Fri May  8 16:54:23 2020
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-cuda
> [0]PETSC ERROR: #1 MatDenseCUDAPlaceArray_SeqDenseCUDA() line 183 in /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
> [0]PETSC ERROR: #2 MatDenseCUDAPlaceArray() line 1930 in /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/mpi/mpidense.c
> [0]PETSC ERROR: #3 BVGetMat_Svec_CUDA() line 749 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #4 BVGetMat() line 1455 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvbasic.c
> [0]PETSC ERROR: #5 BVMatMult_Svec_CUDA() line 556 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #6 BVMatMult() line 597 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvops.c
> [0]PETSC ERROR: #7 EPSSolve_LOBPCG() line 150 in /home/users/proy/copa/jroman/soft/slepc/src/eps/impls/cg/lobpcg/lobpcg.c
> [0]PETSC ERROR: #8 EPSSolve() line 149 in /home/users/proy/copa/jroman/soft/slepc/src/eps/interface/epssolve.c
> [0]PETSC ERROR: #9 main() line 167 in ex19.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -check_pointer_intensity 0
> [0]PETSC ERROR: -eps_type lobpcg
> [0]PETSC ERROR: -error_output_stdout
> [0]PETSC ERROR: -malloc
> [0]PETSC ERROR: -malloc_debug
> [0]PETSC ERROR: -malloc_dump
> [0]PETSC ERROR: -mat_type aijcusparse
> [0]PETSC ERROR: -use_gpu_aware_mpi 0
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
> 
> I tried a simplified version, where I let MatCreateDenseCUDA() allocate the array:
> 
> if (create) {
>  ierr = MatCreateDenseCUDA(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,NULL,&bv->Aget);CHKERRQ(ierr);
> }
> ierr = MatDenseCUDAPlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr);
> 
> Now the error I get:
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: cuda error 1 (cudaErrorInvalidValue) : invalid argument
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.1-187-gd15b076f40  GIT Date: 2020-05-08 11:20:42 +0300
> [0]PETSC ERROR: ./ex19 on a arch-gpu2-intel-c-debug-cuda named gpu2 by jroman Fri May  8 16:54:51 2020
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-cuda
> [0]PETSC ERROR: #1 MatSeqDenseCUDACopyToGPU() line 165 in /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
> [0]PETSC ERROR: #2 MatDenseCUDAPlaceArray_SeqDenseCUDA() line 184 in /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
> [0]PETSC ERROR: #3 MatDenseCUDAPlaceArray() line 1930 in /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/mpi/mpidense.c
> [0]PETSC ERROR: #4 BVGetMat_Svec_CUDA() line 749 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #5 BVGetMat() line 1455 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvbasic.c
> [0]PETSC ERROR: #6 BVMatMult_Svec_CUDA() line 556 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #7 BVMatMult() line 597 in /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvops.c
> [0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 150 in /home/users/proy/copa/jroman/soft/slepc/src/eps/impls/cg/lobpcg/lobpcg.c
> [0]PETSC ERROR: #9 EPSSolve() line 149 in /home/users/proy/copa/jroman/soft/slepc/src/eps/interface/epssolve.c
> [0]PETSC ERROR: #10 main() line 167 in ex19.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -check_pointer_intensity 0
> [0]PETSC ERROR: -eps_type lobpcg
> [0]PETSC ERROR: -error_output_stdout
> [0]PETSC ERROR: -malloc
> [0]PETSC ERROR: -malloc_debug
> [0]PETSC ERROR: -malloc_dump
> [0]PETSC ERROR: -mat_type aijcusparse
> [0]PETSC ERROR: -use_gpu_aware_mpi 0
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
> 
> The pointer vv is obtained with VecCUDAGetArray(). Any idea?
> Jose
> 
> 
>> El 7 may 2020, a las 20:16, Stefano Zampini <stefano.zampini at gmail.com> escribió:
>> 
>> Jose
>> 
>> I have just pushed some code to support MPI DENSE CUDA matrices and MatMatMult operations (basic loop over columns, without copy into vectors).
>> I have rebased against the latest master
>> Let me know if it works for you. I will strip out the relevant commits and make a new MR
>> 
>> Pierre, I have added a test for sbaij in parallel and it works nicely (automatically doing the loop over dense columns). Let me know if it works for you now
>> 
>> Thanks
>> 
>> Il giorno gio 7 mag 2020 alle ore 00:17 Stefano Zampini <stefano.zampini at gmail.com> ha scritto:
>> 
>>> 
>>> 
>>>> El 6 may 2020, a las 20:00, Pierre Jolivet <pierre.jolivet at enseeiht.fr> escribió:
>>>> 
>>>> Stefano,
>>>> Is this working for nsize > 1 https://gitlab.com/petsc/petsc/-/blob/7e88e4dd44e2a5120b858cf9f19502ac359985be/src/mat/tests/ex70.c#L295
>>>> I am now getting (in another example):
>>>> [0]PETSC ERROR: Call MatProductSymbolic() first
>>>> Instead of the previous:
>>>> [0]PETSC ERROR: MatProductSetFromOptions_AB for A mpisbaij and B mpidense is not supported
>>>> 
>> 
>> Pierre,
>> 
>> Not sure what is going on if you do not tell me what to run. My branch stefanozampini/feature-add-hpackages is off master and has been recently rebased (includes the fixes I have made in maint too)
>> BTW, I found your message below Jose’s answer and I never get your original message. Did you forget to send to petsc-dev?
>> 
>> 
>> 
>>>> (But my branch is lagging behind maint, so maybe I’m missing some other fixes, take this with a grain of salt).
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> On 6 May 2020, at 4:52 PM, Stefano Zampini <stefano.zampini at gmail.com> wrote:
>>>>> 
>>>>> I have working support for MATSHELL here https://gitlab.com/petsc/petsc/-/commit/146e7f1ccf5f267b36079cac494077a23e8bbc45
>>>>> Tested here https://gitlab.com/petsc/petsc/-/commit/c4fcaa45a01cc783c629913983b204a1cbcb3939
>>>>> 
>>>>> Jose and Pierre, this code is supposed to work with CUDA, but I haven't tested it yet
>>>>> Can you tell me if this fixes the issues for you to not have to loop over the columns of the dense matrix yourself?
>>>>> 
>>>>> Il giorno mer 6 mag 2020 alle ore 10:09 Stefano Zampini <stefano.zampini at gmail.com> ha scritto:
>>>>> Hong
>>>>> 
>>>>> If the product is not supported, the type of C will never be set anyway, so you cannot call MatHasOperation after MatProductSetFromOptions.
>>>>> The purpose of MatProductSetFromOptions is to populate the function pointers for symbolic and numeric phases. If not found, they should be set to null instead of erroring as it is now.
>>>>> What I propose is to have MatProductHasOperation (not MatHasOperation): this function will be identical to MatHasOperation, with the only difference that does not call PetscValidType on the input mat.
>>>>> 
>>>>> Meanwhile, I’m coding a basic MatMat (and MatTransposeMat) driver to loop over dense columns and apply MatMult. (Or MatMultTranspose) without memory movement.
>>>>> This will be valid for all B matrices being of type dense (and its derivations), with C of type dense too. This in principle will fix Jose and Pierre’s issues (they can correct me if I’m wrong)
>>>>> 
>>>>> However, we should definitely have a way for the user to enquire if a given operation is supported or not. 
>>>>> 
>>>>> Thanks
>>>>> Stefano
>>>>> 
>>>>>> On May 6, 2020, at 12:03 AM, Zhang, Hong <hzhang at mcs.anl.gov> wrote:
>>>>>> 
>>>>>> Stefano:
>>>>>> Now, we need address this bug report: enable MatHasOperation(C,MATOP_MAT_MULT,&flg) for matrix products, e.g., C=A*B, which is related to your issue https://gitlab.com/petsc/petsc/-/issues/608.
>>>>>> 
>>>>>> In petsc-3.13:
>>>>>> 1) MATOP_MAT_MULT, ..., MATOP_MATMAT_MULT are removed from the MATOP table (they are still listed in petscmat.h -- an overlook, I'll remove them). 
>>>>>> MATOP_MAT_MULT_SYMBOLIC/NUMERIC ... are still in the table.
>>>>>> 2) MatHasOperation(C,...) must be called for the matrix product C, not matrix A or B (slepc needs to fix this after this reported bug is fixed).
>>>>>> 
>>>>>> Like MatSetOption(), MatHasOperation() must be called AFTER MatSetType(). You moved MatSetType() from MatProductSetFromOptions() back to MatProductSymbolic() in your latest patch, thus user has to call MatHasOption() after MatProductSymbolic():
>>>>>> 
>>>>>> MatProductCreate(A,B,NULL,&C);
>>>>>> MatProductSetType(C,...);
>>>>>> ...
>>>>>> MatProductSetFromOptions();   //if the product is not supported for the given mat types, currently petsc crashes here, which we can replace with an error output
>>>>>> 
>>>>>> MatProductSymbloc(); -> call MatSetType()
>>>>>> MatHasOperation(C,MATOP_MAT_MULT,&flg)
>>>>>> 
>>>>>> Question: how to call MatHasOperation(C,..) when MatProductSymbloc() is not supported?
>>>>>> 
>>>>>> My fix to this bug:
>>>>>> Resume MatSetType() in MatProductSetFromOptions(). Then user calls:
>>>>>> 
>>>>>> MatProductCreate(A,B,NULL,&C);
>>>>>> MatProductSetType(C,...);
>>>>>> ...
>>>>>> MatProductSetFromOptions(C);  //if the product is not supported for the given mat types, C->ops->productsymbolic=NULL;
>>>>>> MatHasOperation(C,MATOP_PRODUCTSYMBOLIC,&flg);
>>>>>> if (flg) { 
>>>>>>  MatProductSymbolic(C);
>>>>>>  ...
>>>>>> } else {
>>>>>>  MatDestroy(&C);
>>>>>>  ...
>>>>>> }
>>>>>> 
>>>>>> Either you take care of this bug report, or let me know your thoughts about how to fix this bug.
>>>>>> Hong
>>>>>> From: Zhang, Hong <hzhang at mcs.anl.gov>
>>>>>> Sent: Saturday, April 25, 2020 2:40 PM
>>>>>> To: Pierre Jolivet <pierre.jolivet at enseeiht.fr>
>>>>>> Cc: Jose E. Roman <jroman at dsic.upv.es>; Stefano Zampini <stefano.zampini at gmail.com>; petsc-dev <petsc-dev at mcs.anl.gov>; Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>>> Subject: Re: [petsc-dev] MATOP_MAT_MULT
>>>>>> 
>>>>>> Pierre,
>>>>>> When we do 
>>>>>> MatProductCreate: C = A*B; //C owns A and B, thus B->refct =2
>>>>>> MatProductCreateWithMats: B = A*C; //If I let B own A and C, then C->refct=2
>>>>>> Then
>>>>>> MatDestroy(&B) and MatDestroy(&C) only reduce their refct from 2 to 1, thus memory leak. 
>>>>>> My solution is adding 
>>>>>> {
>>>>>>          matreference;  /* do not add refct when using MatProductCreateWithMat() to void recursive references */
>>>>>> } Mat_Product 
>>>>>> This flg prevents MatProductCreateWithMats() to increase reference counts, i.e., B does not own A and C to avoid reverse ownership. I am not sure this is a reasonable solution. Let me know if you have better solution.
>>>>>> See ex109.c and ex195.c for tests.
>>>>>> Hong
>>>>>> From: Pierre Jolivet <pierre.jolivet at enseeiht.fr>
>>>>>> Sent: Saturday, April 25, 2020 11:45 AM
>>>>>> To: Zhang, Hong <hzhang at mcs.anl.gov>
>>>>>> Cc: Jose E. Roman <jroman at dsic.upv.es>; Stefano Zampini <stefano.zampini at gmail.com>; petsc-dev <petsc-dev at mcs.anl.gov>; Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>>> Subject: Re: [petsc-dev] MATOP_MAT_MULT
>>>>>> 
>>>>>> Hong,
>>>>>> José didn’t report this, though he may have run into the same issue, I did.
>>>>>> I’ll try the branch and get back at you on GitLab MR.
>>>>>> 
>>>>>> Thanks,
>>>>>> Pierre
>>>>>> 
>>>>>>> On 25 Apr 2020, at 6:17 PM, Zhang, Hong <hzhang at mcs.anl.gov> wrote:
>>>>>>> 
>>>>>>> Jose,
>>>>>>> 
>>>>>>>>> I also now just tested some previously PETSC_VERSION_LT(3,13,0) running code with C=A*B, Dense=Nest*Dense, all previously allocated prior to a call to MatMatMult and scall = MAT_REUSE_MATRIX.
>>>>>>>>> Sadly, it’s now broken. It is my fault for not having a test for this in https://gitlab.com/petsc/petsc/-/merge_requests/2069, sorry about that.
>>>>>>>>> [0]PETSC ERROR: Call MatProductSymbolic() first
>>>>>>>>> [0]PETSC ERROR: #1 MatProductNumeric() line 730 in /ccc/work/cont003/rndm/rndm/petsc/src/mat/interface/matproduct.c
>>>>>>>>> [0]PETSC ERROR: #2 MatMatMult() line 9335 in /ccc/work/cont003/rndm/rndm/petsc/src/mat/interface/matrix.c
>>>>>>>>> 
>>>>>>>>> Here is a reproducer (that will work OK with 3.12.4).
>>>>>>>>> diff --git a/src/mat/tests/ex195.c b/src/mat/tests/ex195.c
>>>>>>>>> index c72662bc3c..811de669c5 100644
>>>>>>>>> --- a/src/mat/tests/ex195.c
>>>>>>>>> +++ b/src/mat/tests/ex195.c
>>>>>>>>> @@ -73,2 +73,3 @@ int main(int argc,char **args)
>>>>>>>>>  ierr = MatMatMult(nest,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C);CHKERRQ(ierr);
>>>>>>>>> +  ierr = MatMatMult(nest,C,MAT_REUSE_MATRIX,PETSC_DEFAULT,&B);CHKERRQ(ierr);
>>>>>>>>>  ierr = MatMatMultEqual(nest,B,C,10,&equal);CHKERRQ(ierr);
>>>>>>>>> 
>>>>>>>>> $ make -f gmakefile test searchin=mat_tests-ex195
>>>>>>>>> 
>>>>>>>>> I believe this is very close to the topic at hand and issue #608, so maybe you could fix this as well in the same upcoming MR? Just let me know, I can have a crack it otherwise.
>>>>>>> 
>>>>>>> This is a bug. I fixed it in the branch hzhang/fix-matproduct-reuse/maint. Can you test it?
>>>>>>> Hong
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Stefano
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Stefano
> 



More information about the petsc-dev mailing list