<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div><span class="x_x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">Dear PETSc Team</span></div>
<div class="x_x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_x_elementToProof x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG and Hypre preconditioners. We have encountered several issues that we would like to ask for your suggestions.</div>
<div class="x_x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_x_elementToProof x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
First, we have couple of questions when working with a single MPI rank:</div>
<div class="x_x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<ol data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}">
<li style="list-style-type:"1. ""><span style="margin:0px">We have tested two backends, CUDA and Kokkos. One commonly encountered error is related to SpGEMM in CUDA when the mat is large as listed below:</span>
<div style="margin:0px;list-style-type:"2. ""><br>
</div>
<div style="margin:0px;list-style-type:"2. "">cudaMalloc((void **)&buffer2, bufferSize2) error( cudaErrorMemoryAllocation): out of memory</div>
<div style="margin:0px;list-style-type:"2. ""><span style="margin:0px"><br>
</span></div>
<div style="margin:0px;list-style-type:"2. ""><span style="margin:0px">For CUDA backend, one can use "-</span><span style="margin:0px"><span class="x_x_ui-provider x_x_gg x_x_b x_x_c x_x_d x_x_e x_x_f x_x_g x_x_h x_x_i x_x_j x_x_k x_x_l x_x_m x_x_n x_x_o x_x_p x_x_q x_x_r x_x_s x_x_t x_x_u x_x_v x_x_w x_x_x x_x_y x_x_z x_x_ab x_x_ac x_x_ae x_x_af x_x_ag x_x_ah x_x_ai x_x_aj x_x_ak x_x_ContentPasted1" dir="ltr" style="margin:0px">matmatmult_backend_cpu
-<span class="x_x_ui-provider x_x_gg x_x_b x_x_c x_x_d x_x_e x_x_f x_x_g x_x_h x_x_i x_x_j x_x_k x_x_l x_x_m x_x_n x_x_o x_x_p x_x_q x_x_r x_x_s x_x_t x_x_u x_x_v x_x_w x_x_x x_x_y x_x_z x_x_ab x_x_ac x_x_ae x_x_af x_x_ag x_x_ah x_x_ai x_x_aj x_x_ak x_x_ContentPasted2" dir="ltr" style="margin:0px">matptap_backend_cpu"
to avoid these problems. However, there seems no equivalent options in Kokkos backend. Is there any good practice to avoid this error for both backends and if we can avoid this error in Kokkos backend?</span></span></span></div>
<div style="margin:0px;list-style-type:"2. ""><span style="margin:0px"><br>
</span></div>
</li><li class="x_elementToProof" style="list-style-type:"2. ""><span style="margin:0px"><span class="x_x_ContentPasted3" style="margin: 0px; display: inline !important; background-color: rgb(255, 255, 255);">We have tested the combination of Hypre and Kokkos as
backend. It looks like this combination is not compatible with each other, as we observed that KSPSolve takes a greater number of iterations to exit, and the residual norm in the post-checking is much larger than the one obtained when working with CUDA backend.
This happens for matrices with block size larger than 1. Is there any explanation to the error?</span></span></li></ol>
</div>
<div class="x_x_elementToProof x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
Second, we have couple more questions when working with multiple MPI ranks:</div>
<div class="x_x_elementToProof" style="font-size: 12pt; font-family: Calibri, Arial, Helvetica, sans-serif; margin: 0px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<ol data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}">
<li class="x_elementToProof" style="list-style-type:"1. ""><span style="margin:0px">We are currently using OpenMPI as we couldnt get Intel MPI to work as a GPU-aware MPI, is this a known issue with Intel MPI?</span></li><li class="x_elementToProof" style="list-style-type:"2. ""><span style="margin:0px">With OpenMPI we currently see a slow down when increasing the MPI count as shown in the figure below, is this normal?</span></li></ol>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<img style="max-width:100%" class="ContentPasted0 w-717 h-714" size="85131" contenttype="image/png" data-outlook-trace="F:1|T:1" src="cid:9242808d-34af-4b51-8a0b-8295f0a012e5"><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Zisheng</div>
</body>
</html>