[petsc-users] GAMG and Hypre preconditioner

Tue Jun 27 13:00:08 CDT 2023

Hi Jed

Thanks for your reply. I have sent the log files to petsc-maint at mcs.anl.gov.

Zisheng
________________________________
From: Jed Brown <jed at jedbrown.org>
Sent: Tuesday, June 27, 2023 1:02 PM
To: Zisheng Ye <zisheng.ye at ansys.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] GAMG and Hypre preconditioner

[External Sender]

Zisheng Ye via petsc-users <petsc-users at mcs.anl.gov> writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG and Hypre preconditioners. We have encountered several issues that we would like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **)&buffer2, bufferSize2) error( cudaErrorMemoryAllocation): out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" to avoid these problems. However, there seems no equivalent options in Kokkos backend. Is there any good practice to avoid this error for both backends and if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix algorithms use extra memory. We should be able to make the host option available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks like this combination is not compatible with each other, as we observed that KSPSolve takes a greater number of iterations to exit, and the residual norm in the post-checking is much larger than the one obtained when working with CUDA backend. This happens for matrices with block size larger than 1. Is there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, GPU-aware MPI has been incredibly flaky on all HPC systems despite being introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could send those here or to petsc-maint at mcs.anl.gov. We need to see what kind of work is not scaling to attribute what may be causing it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230627/99b47257/attachment.html>