[petsc-users] Choosing VecScatter Method in Matrix-Vector Product

Mon Jan 27 13:50:27 CST 2020

--Junchao Zhang

On Mon, Jan 27, 2020 at 10:09 AM Felix Huber <st107539 at stud.uni-stuttgart.de<mailto:st107539 at stud.uni-stuttgart.de>> wrote:
Thank you all for you reply!

> Are you using a KSP/PC configuration which should weak scale?
Yes the system is solved with KSPSolve. There is no preconditioner yet,
but I fixed the number of CG iterations to 3 to ensure an apples to
apples comparison during the scaling measurements.

>> VecScatter has been greatly refactored (and the default implementation
>> is entirely new) since 3.7.

I now tried to use PETSc 3.11 and the code runs fine. The communication
seems to show a better weak scaling behavior now.

I'll see if we can just upgrade to 3.11.

> Anyway, I'm curious about your
> configuration and how you determine that MPI_Alltoallv/MPI_Alltoallw is
> being used.
I used the Extrae profiler which intercepts all MPI calls and logs them
into a file. This showed that Alltoall is being used for the
communication, which I found surprising. With PETSc 3.11 the Alltoall
calls are replaced by MPI_Start(all) and MPI_Wait(all), which sounds
more reasonable to me.
> This has never been a default code path, so I suspect
> something in your environment or code making this happen.

I attached some log files for some PETSc 3.7 runs on 1,19 and 115 nodes
(24 cores each) which suggest polynomial scaling (vs logarithmic
scaling). Could it be some installation setting of the PETSc version? (I
use a preinstalled PETSc)
I checked petsc 3.7.6 and did not think the vecscatter type could be set at configure time.  Anyway, upgrading petsc is preferred. If that is not possible, we can work together to see what happened.

> Can you please send representative log files which characterize the
> lack of scaling (include the full log_view)?

"Stage 1: activation" is the stage of interest, as it wraps the
KSPSolve. The number of unkowns per rank is very small in the
measurement, so most of the time should be communication. However, I
just noticed, that the stage also contains an additional setup step
which might be the reason why the MatMul takes longer than the KSPSolve.
I can repeat the measurements if necessary.
I should add, that I put a MPI_Barrier before the KSPSolve, to avoid any
previous work imbalance to effect the KSPSolve call.

You can use -log_sync, which adds an MPI_Barrier at the beginning of each event. Compare log_view files with and without -log_sync. If an event has much higher %T without -log_sync than with -log_sync, it means the code is not balanced. Alternatively, you can look at the Ratio column in log file without -log_sync.

Best regards,
Felix

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200127/feb1c008/attachment.html>