<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi.<br>
<br>
I am using PETSc conjugate gradient liner solver with GPU acceleration (CUDA), on multiple GPUs and multiple MPI processes.<br>
<br>
I noticed that the performances degrade significantly when using multiple MPI processes per GPU, compared to using a single process per GPU.<br>
For example, 2 GPUs with 2 MPI processes will be about 40% faster than running the same calculation with 2 GPUs and 16 MPI processes.<br>
<br>
I would assume the natural MPI/GPU affinity would be 1-1, however the rest of my application can benefit from multiple MPI processes driving GPU via nvidia MPS, therefore I am trying to understand if this is expected, if I am possibly missing something in the
initialization/setup, or if my best choice is to constrain 1-1 MPI/GPU access especially for the PETSc linear solver step. I could not find explicit information about it in the manual.<br>
<br>
Is there any user or maintainer who can tell me more about this use case?<br>
</div>
<div class="elementToProof" id="Signature">
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Best Regards,<br>
Gabriele Penazzi</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<p style="margin: 0cm; font-family: Aptos, sans-serif; font-size: 12pt;" class="elementToProof">
</p>
<p style="margin: 0cm; font-family: Aptos, sans-serif; font-size: 12pt;" class="elementToProof">
</p>
</div>
</body>
</html>