[petsc-users] Strange mpi timing and CPU load when -np > 2

Matthew Knepley knepley at gmail.com
Mon Sep 26 11:57:09 CDT 2022


On Mon, Sep 26, 2022 at 12:40 PM Duan Junming via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear all,
>
> I am using PETSc 3.17.4 on a Linux server, compiling
> with --download-exodus --download-hdf5 --download-openmpi
> --download-triangle --with-fc=0 --with-debugging=0
> PETSC_ARCH=arch-linux-c-opt COPTFLAGS="-g -O3" CXXOPTFLAGS="-g -O3".
> The strange thing is when I run my code with mpirun -np 1 ./main, the CPU
> time is 30s.
> When I use mpirun -np 2 ./main, the CPU time is 16s. It's OK.
> But when I use more than 2 CPUs, like mpirun -np 3 ./main, the CPU time is
> 30s.
> The output of command time is: real 0m30.189s, user 9m3.133s, sys
> 10m55.715s.
> I can also see that the CPU load is about 100% for each process when np =
> 2, but the CPU load goes to 2000%, 1000%, 1000% for each process (the
> server has 40 CPUs).
> Do you have any idea about this?
>

I believe this is an MPI implementation problem, in which there is a large
penalty for oversubscription. I think you can try

  --download-mpich --download-mpich-pm=gforker

which should be good for oversubscription.

  Thanks,

      Matt



> Thanks in advance!
>
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220926/55eae706/attachment.html>


More information about the petsc-users mailing list