[petsc-dev] Controlling matrix type on different levels of multigrid hierarchy? (Motivation is GPUs)

Wed Jun 12 15:22:58 CDT 2019

> On Jun 12, 2019, at 2:55 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Wed, Jun 12, 2019 at 3:41 PM Smith, Barry F. via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> 
>  You'll never benefit from having coarser levels on the CPU but I guess you need a general mechanism to try and see that for yourself.
> 
>   I think it should be a property of the DM and never let the PCMG see it. So something like DMSetUseGPUWhen(dm, gpuvectype, gpumattype,localsize)  the command line option -dm_use_gpu_when [gpumattype,gpuvectype],[localsize] where the localsize defaults to what turns out to be generally the best in our experiments (which may depend on the machine). Then instead of using dm->mattype and dm->vectype in the code we would always inquire with 
> 
>   DMGetGetType(dm, *type) (same for vectors) 
>     if !dm->usegpu *type = dm->mattype;
>     else if localsize > dm->gpulocalsize *type = dm->gpumattype 
>     else *type = dm->mat type;
> 
>   Note that the current model will also continue work so someone can still do -dm_vec_type cuda 
> 
> This is much much worse than the previous mechanism. You have some formula for making a decision, which is not obvious
> to the user, and means that you cannot make different size decisions for different solves.

   You give a prefix for each different DM you are simulating with.

If that doesn't work for some reason (you want multiple different solvers using the same DM) then I would suggest SNES/KSP/PCPinToCPUWhen(YYY,size). Now each time the solver object gets a Vec/Mat from the DM it can pin it based on size. It could pin it in two possible ways it could pin the DM  with DMPinToCPUWhen() it is specifically getting the Vec/Mat from or it can Pin each Vec/Mat that it gets. 

If you hate the pinning approach then the solver can directly use DMSetMatType() on that particular level to set or not set to use a GPU vector/matrix before creating the vec/mat

I don't mind experimenting with both the pin and non-pin approach; I clearly have tons of experience so know exactly what I am talking about, not :-)

>  
>   Alternative model would be to use the GPU vector and matrix types everywhere and simply pin the coarser matrices and vecs to the CPU so have
> DMPinToCPUWhen(dm,localsize) and then
> 
> DMCreateMatrix_YYY(dm,&mat) would look like
>    MatCreate()
>    MatSetType(*mat,dm->mat type);
>    if localsize < dm->localsize MatPinToCPU(*mat,PETSC_TRUE);
> 
> I actually like the second model better. Note that pinning means that no space is used on the mat/vec on the GPU unless unpinned. But the mat/vec can be unpinned at anytime. With the first model you are stuck with it always what you chose initially  stuck on the CPU. In fact you could pin at say 1000 localsize, run for a while, and then the code could decide to pin or unpin more to find an "optimal" value. Thus I really like pinning.
> 
> But how will you control it in these dynamically created objects without giving prefixes, and if you give prefixes how is
> this any different than just directly telling them where to go (except that it is more complicated)?

  I glossed over the difficulties of pinning or unpinning a set of vecs/matrices after they are created and running for awhile. But since a DM knows all the children it created we could have DMPinToCPU() (note no When) loop over all its children Vecs and Mats and pin or unpin them at any time. Or DMPinToCPUChildren(). We could also do this for solvers SNES/KSP/PCPinToCPUChildren() to magically change the coarse grid from the CPU to GPU (and thus run faster :-)). You could pull out a particular level with PCMGGet.. and pin or unpin that level. Lots of dynamic possibilities you totally lose if you hardwire any of the Vecs/Mats to non-GPU versions.

  Barry

> 
>   Matt
>  
> Also we don't want to hardwire codes/commend lines to particular matrix types but allow any GPU that is available so have things like MATGPU that default to what is available. So I can do -dm_mat_type gpu -dm_use_gpu_when 500 and it uses cuda if that is available and ViennaCL if it is available.
> 
>   Better ideas, even more general? 
> 
>   Note that if dm->localsize which is a property of the dm and its partition is different on different processes that is fine, on some process you can have a VecGPU and others not or pinned on some and not on others for the same vector.
> 
>   If you play with either of these models please do it off of my branch, not master.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > On Jun 12, 2019, at 11:38 AM, Mills, Richard Tran via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> > 
> > Colleagues,
> > 
> > I think we ought to have a way to control which levels of a PETSc multigrid solve happen on the GPU vs. the CPU, as I'd like to keep coarse levels on the CPU, but run the calculations for finer levels on the GPU. Currently, for a code that is using a DM to manage its grid, one can use GPUs inside the application of PCMG by doing putting something like
> > 
> >   -dm_mat_type aijcusparse -dm_vec_type cuda
> > 
> > on the command line. What I'd like to be able to do is to also control which levels get plain AIJ matrices and which get a GPU type, maybe via something like
> > 
> >   -mg_levels_N_dm_mat_type aijcusparse -mg_levels_N_dm_mat_type cuda
> > 
> > for level N. (Being able to specify a range of levels would be even nicer, but let's start simple.)
> > 
> > Maybe doing the above is as simple as making sure that DMSetFromOptions() gets called for the DM for each level. But I think I may be not understanding some sort of additional complications. Can someone who knows the PCMG framework better chime in? Or do others have ideas for a more elegant way of giving this sort of control to the user?
> > 
> > Best regards,
> > Richard
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/