[petsc-dev] CUDA GAMG coarse grid solver
Mark Adams
mfadams at lbl.gov
Sun Jul 21 20:12:48 CDT 2019
Barry,
Option left: name:-mg_coarse_mat_solver_type value: cusparse
I tried this too:
Option left: name:-mg_coarse_sub_mat_solver_type value: cusparse
Here is the view. cuda did not get into the factor type.
PC Object: 24 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0.05
0.025 0.0125
Threshold scaling factor for each level not specified = 0.5
AGG specific options
Symmetric graph false
Number of levels to square graph 10
Number smoothing steps 1
Complexity: grid = 1.14213
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 24 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 24 MPI processes
type: bjacobi
number of blocks = 24
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
*type: seqaij*
rows=6, cols=6
package used to perform factorization: petsc
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 2 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
*type: seqaijcusparse*
rows=6, cols=6
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 2 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 24 MPI processes
* type: mpiaijcusparse*
rows=6, cols=6, bs=6
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using scalable MatPtAP() implementation
using I-node (on process 0) routines: found 2 nodes, limit used
is 5
Down solver (pre-smoother) on level 1 -------------------------------
On Sun, Jul 21, 2019 at 3:58 PM Mark Adams <mfadams at lbl.gov> wrote:
> Barry, I do NOT see communication. This is what made me think it was not
> running on the GPU. I added print statements and found that
> MatSolverTypeRegister_CUSPARSE IS called but (what it registers)
> MatGetFactor_seqaijcusparse_cusparse does NOT get called.
>
> I have a job waiting on the queue. I'll send ksp_view when it runs. I will
> try -mg_coarse_mat_solver_type cusparse. That is probably the problem.
> Maybe I should set the coarse grid solver in a more robust way in GAMG,
> like use the matrix somehow? I currently use PCSetType(pc, PCLU).
>
> I can't get an interactive shell now to run DDT, but I can try stepping
> through from MatGetFactor to see what its doing.
>
> Thanks,
> Mark
>
> On Sun, Jul 21, 2019 at 11:14 AM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>
>> > On Jul 21, 2019, at 8:55 AM, Mark Adams via petsc-dev <
>> petsc-dev at mcs.anl.gov> wrote:
>> >
>> > I am running ex56 with -ex56_dm_vec_type cuda -ex56_dm_mat_type
>> aijcusparse and I see no GPU communication in MatSolve (the serial LU
>> coarse grid solver).
>>
>> Do you mean to say, you DO see communication?
>>
>> What does -ksp_view should you? It should show the factor type in the
>> information about the coarse grid solve?
>>
>> You might need something like -mg_coarse_mat_solver_type cusparse
>> (because it may default to the PETSc one, it may be possible to have it
>> default to the cusparse if it exists and the matrix is of type
>> MATSEQAIJCUSPARSE).
>>
>> The determination of the MatGetFactor() is a bit involved including
>> pasting together strings and string compares and could be finding a CPU
>> factorization.
>>
>> I could run on one MPI_Rank() in the debugger and put a break point in
>> MatGetFactor() and track along to see what it picks and why. You could do
>> this debugging without GAMG first, just -pc_type lu
>>
>> > GAMG does set the coarse grid solver to LU manually like this: ierr =
>> PCSetType(pc2, PCLU);CHKERRQ(ierr);
>>
>> For parallel runs this won't work using the GPU code and only
>> sequential direct solvers, so it must using BJACOBI in that case?
>>
>> Barry
>>
>>
>>
>>
>>
>> > I am thinking the dispatch of the CUDA version of this got dropped
>> somehow.
>> >
>> > I see that this is getting called:
>> >
>> > PETSC_EXTERN PetscErrorCode MatSolverTypeRegister_CUSPARSE(void)
>> > {
>> > PetscErrorCode ierr;
>> >
>> > PetscFunctionBegin;
>> > ierr =
>> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_LU,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr);
>> > ierr =
>> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_CHOLESKY,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr);
>> > ierr =
>> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_ILU,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr);
>> > ierr =
>> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_ICC,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr);
>> > PetscFunctionReturn(0);
>> > }
>> >
>> > but MatGetFactor_seqaijcusparse_cusparse is not getting called.
>> >
>> > GAMG does set the coarse grid solver to LU manually like this: ierr =
>> PCSetType(pc2, PCLU);CHKERRQ(ierr);
>> >
>> > Any ideas?
>> >
>> > Thanks,
>> > Mark
>> >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190721/842d191d/attachment.html>
More information about the petsc-dev
mailing list