[petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Vanella, Marcos (Fed)
marcos.vanella at nist.gov
Tue Jun 27 14:03:58 CDT 2023
Sorry, meant 100K to 200K cells.
Also, check the release page of suitesparse. The mutli-GPU version of cholmod might be coming soon:
https://people.engr.tamu.edu/davis/SuiteSparse/index.html
________________________________
From: Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
Sent: Tuesday, June 27, 2023 2:56 PM
To: Matthew Knepley <knepley at gmail.com>
Cc: Mark Adams <mfadams at lbl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm trying to see if GPU would provide an advantage for this particular Poisson solution we do in our code.
Our grids are staggered with the Poisson unknown in cell centers. All my tests for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster option for these meshes considering the mesh as unstructured (an implementation separate from the PETSc option). We have the option of Fishpack (fast trigonometric solvers), but that is not as general (requires solution on the whole mesh + a special treatment of immersed geometry). The single mesh solver is used as a black box within a fixed point domain decomposition iteration in multi-mesh cases. The approximation error in this method is confined to the mesh boundaries.
The other option I have tried with MKL is to build the global matrix across all meshes and use the MKL cluster sparse solver. The problem becomes a memory one for meshes that go over a couple million unknowns due to the exact Cholesky factorization matrix storage. I'm thinking the other possibility using PETSc is to build in parallel the global matrix (as done for the MKL global solver) and try the GPU accelerated Krylov + multigrid preconditioner. If this can bring down the time to solution to what we get for the previous scheme and keep memory use undrr control it would be a good option for CPU+GPU systems. Thing is we need to bring the residual of the equation to ~10^-10 or less to avoid instability so it might still be costly.
I'll keep you updated. Thanks,
Marcos
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Tuesday, June 27, 2023 2:08 PM
To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
Cc: Mark Adams <mfadams at lbl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) <marcos.vanella at nist.gov<mailto:marcos.vanella at nist.gov>> wrote:
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the hypre Boomer AMG. They work just fine for my case. I also got my hands on a machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled solvers. I was able to submit a case for the default Krylov solver with these runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky -pc_factor_mat_solver_type cusparse. The case run to completion.
I guess my question now is how do I monitor (if there is a way) that the GPU is being used in the calculation, and any other stats?
You should get that automatically with
-log_view
If you want finer-grained profiling of the kernels, you can use
-log_view_gpu_time
but it can slows things down.
Also, which other solver combination using GPU would you recommend for me to try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE?
Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now for GPUs. It depends on what card you have, what version of the driver, what version of the libraries, etc. It is very fragile. Hopefully this period ends soon, but I am not optimistic. Unless you are very confident that GPUs will help,
I would not recommend spending the time.
Thanks,
Matt
Thank you for your help!
Marcos
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov<mailto:marcos.vanella at nist.gov>>
Cc: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can I just use the --download-hypre configure line?
Yes,
Thanks,
Matt
That is what I did with suitesparse, very nice.
________________________________
From: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov<mailto:marcos.vanella at nist.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
I'm not sure what MG is doing with an "unstructured" problem. I assume you are not using DMDA.
-pc_type gamg should work
I would configure with hypre and try that also: -pc_type hypre
As Matt said MG should be faster. How many iterations was it taking?
Try a 100^3 and check that the iteration count does not change much, if at all.
Mark
On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi, I was wondering if anyone has experience on what combinations are more efficient to solve a Poisson problem derived from a 7 point stencil on a single mesh (serial).
I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type cholmod for the case I'm testing.
I'm new to PETSc so any suggestions are most welcome and appreciated,
Marcos
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230627/293eadbf/attachment-0001.html>
More information about the petsc-users
mailing list