[petsc-users] Slower performance using more MPI processes

Barry Smith bsmith at petsc.dev
Fri Sep 8 16:00:07 CDT 2023


  It would be very helpful if you could run on 1 and 2 ranks with -log_view and send all the output.

  

> On Sep 8, 2023, at 4:52 PM, Chris Hewson <chris at resfrac.com> wrote:
> 
> Hi There,
> 
> I am trying to solve a linear problem and am having an issue when I use more MPI processes with the KSPsolve slowing down considerably the more processes I add.
> 
> The matrix itself is 620100 X 620100 with ~5 million non-zero entries, I am using petsc version 3.19.5 and have tried with a couple different versions of mpich getting the same behavior (v4.1.2 w/ device ch4:ofi and v3.3.2 w/ ch3:sock).
> 
> In testing, I've noticed the following trend for speed for the KSPSolve function call:
> 1 core: 4042 ms
> 2 core: 7085 ms
> 4 core: 26573 ms
> 8 core: 65745 ms
> 16 core: 149283 ms
> 
> This was all done on a single node machine w/ 16 non-hyperthreaded cores. We solve quite a few different matrices with PETSc using MPI and haven't noticed an impact like this on performance before.
> 
> I am very confused by this and am a little stumped at the moment as to why this was happening. I've been using the KSPBCGS solver to solve the problem. I have tried with multiple different solvers and pre-conditioners (we usually don't use a pre-conditioner for this part of our code). 
> 
> It did seem that using the piped BCGS solver did help improve the parallel speed slightly (maybe 15%), but it still doesn't come close to the single threaded speed. 
> 
> I'll attach a link to a folder that contains the specific A, x and b matrices for this problem, as well as a main.cpp file that I was using for testing. 
> 
> https://drive.google.com/drive/folders/1CEDinKxu8ZbKpLtwmqKqP1ZIDG7JvDI1?usp=sharing
> 
> I was testing this in our main code base, but don't include that here, and observe very similar speed results to the ones above. We do use Metis to graph partition in our own code and checked the vector and matrix partitioning and that all made sense. I could be doing the partitioning incorrectly in the example (not 100% sure how it works with the viewer/load functions).
> 
> Any insight or thoughts on this would be greatly appreciated.
> 
> Thanks,
> 
> Chris Hewson
> Senior Reservoir Simulation Engineer
> ResFrac
> +1.587.575.9792

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230908/9833e270/attachment-0001.html>


More information about the petsc-users mailing list