[petsc-dev] problem with MatSeqAIJCUSPARSEILUAnalysisAndCopyToGPU
Zhang, Hong
hongzhang at anl.gov
Tue Dec 22 18:58:30 CST 2020
On Dec 22, 2020, at 3:38 PM, Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>> wrote:
I am MPI serial LU solving a smallish matrix (2D, Q3, 8K equations) on a Summit node (42 P9 cores, 6 V100 GPUs) using cuSparse and Kokkos kernels. The cuSparse performance is terrible.
I solve the same TS problem in MPI serial on each global process. I run with NP=1 or (all) 7 cores/MPI per GPU:
MatLUFactorNum time, using all 6 GPUs:
NP/GPU cuSparse Kokkos kernels
1 0.12 0.075
7 0.55 0.072 // some noise here
So cuSparse is about 2x slower on one process and 8x slower when using all the cores, from memory contention I assume.
I found that the problem is in MatSeqAIJCUSPARSEBuildILULower[Upper]TriMatrix. Most of this excess time is in:
cerr = cudaMallocHost((void**) &AALo, nzLower*sizeof(PetscScalar));CHKERRCUDA(cerr);
and
cerr = cudaFreeHost(AALo);CHKERRCUDA(cerr);
nzLower is about 140K. Here is my timer data, in a stage after a "warm up stage":
Inner-MatSeqAIJCUSPARSEBuildILULowerTriMatrix 12 1.0 2.3514e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 23 0 0 0 0 0 0 12 1.34e+01 0 0.00e+00 0
MatSeqAIJCUSPARSEBuildILULowerTriMatrix: cudaMallocHost 12 1.0 1.5448e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 15 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatSeqAIJCUSPARSEBuildILULowerTriMatrix: cudaFreeHost 12 1.0 8.3908e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 8 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
Allocation/free of pinned memory is slow, usually on the order of several milliseconds. So these numbers look normal. Is there any opportunity to reuse the pinned memory in these functions?
Hong (Mr.)
This 0.23 sec happens in Upper also, for a total of ~0.46, which pretty much matches the difference with Kokkos.
Any ideas?
Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20201223/33265bbb/attachment.html>
More information about the petsc-dev
mailing list