[petsc-dev] PETSc - MPI3 functionality

Sun Sep 9 08:09:09 CDT 2018

Tamara Dancheva <tamaradanceva19933 at gmail.com> writes:

> Hi Barry,
>
> I see the issue..
>
>  In the FEM library and solver that I am working on, PETSc is used all
> throughout for the data distribution, synchronization of functions,
> assembly. There is another UPC alternation of using the JANPACK linear
> algebra backend (http://www.csc.kth.se/~njansson/janpack/), which gives
> increased performance. 

How do you know the JANPACK performance is better?  The figures on that
website appeared in a paper submission that was ultimately rejected
after it was discovered that the convergence criteria actually differed
by orders of magnitude and the reference PETSc results were uniformly
faster.  The most recent release appears to have been in 2015.

> My project is about exploring another pathway, optimization given that
> this software targets large scale computations, an asynchronous
> version of the algorithm for which I have implemented a Block-Jacobi
> with inner Krylov Solvers (inner solve with PETSc). This version aims
> for a speedup factor of about 1.7-2.0 (from some literature although
> not in the same context exactly) 

Could you share What literature you are basing this estimate on?  It's
important to make comparisons using a performance model.  For example,
if current PETSc results attain 70% of STREAM bandwidth, then no amount
of latency/communication optimization will yield your desired
improvement factors.  On the other hand, if your solver is latency
dominated due to pushing to the limit of strong scalability, then these
optimizations might be possible (with many caveats).

If you could send -log_view output for your application, it would help
us understand the performance setting of your current solver
configuration.

> and it is done with the same motivation behind ExaFLOW
> (http://exaflow-project.eu/), I would say. This still requires me to
> modify the ghost exchange routines in order to be able to advance the
> processes out of sync. I could implement this out of PETSc, but I
> would significantly increase the memory footprint, since the necessary
> data is currently fed to PETSc and discarded. In this context, since
> PETSc also works with, stores MPI requests, I can reuse and extend
> upon the implementation since this is close to the approach I have in
> mind (using either a circular of limited size buffer of MPI Requests
> and non-blocking collectives). I had also considering not using PETSc
> at all to avoid all the blocking regions, however considering the
> scope of my project, deemed that it would take too long to implement
> and validate.

It's very reasonable to implement in PETSc, but let's discuss the
communication pattern first.  You said you are working with a FEM model,
but also mention "igatherv".  Is this for some sequential mesh
processing task or is it related to the solver?  There isn't a
neighborhood igatherv and MPI_Igatherv isn't a pattern that should ever
be needed in a FEM solver.