<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Barry,<br><div><br></div><div>I see the issue..</div><div><br></div><div> In the FEM library and solver that I am working on, PETSc is used all throughout for the data distribution, synchronization of functions, assembly. There is another UPC alternation of using the JANPACK linear algebra backend (<a href="http://www.csc.kth.se/~njansson/janpack/">http://www.csc.kth.se/~njansson/janpack/</a>), which gives increased performance. My project is about exploring another pathway, optimization given that this software targets large scale computations, an asynchronous version of the algorithm for which I have implemented a Block-Jacobi with inner Krylov Solvers (inner solve with PETSc). This version aims for a speedup factor of about 1.7-2.0 (from some literature although not in the same context exactly) and it is done with the same motivation behind ExaFLOW (<a href="http://exaflow-project.eu/">http://exaflow-project.eu/</a>), I would say. This still requires me to modify the ghost exchange routines in order to be able to advance the processes out of sync. I could implement this out of PETSc, but I would significantly increase the memory footprint, since the necessary data is currently fed to PETSc and discarded. In this context, since PETSc also works with, stores MPI requests, I can reuse and extend upon the implementation since this is close to the approach I have in mind (using either a circular of limited size buffer of MPI Requests and non-blocking collectives). I had also considering not using PETSc at all to avoid all the blocking regions, however considering the scope of my project, deemed that it would take too long to implement and validate.</div><div><br></div><div>Hope this sums it up well,</div><div>Tamara</div><div><br></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Sep 8, 2018 at 4:28 AM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

    Tamara,<br>

<br>

       The VecScatter routines are in a big state of flux now as we try to move from a monolithic implementation (where many cases were handled with cumbersome if checks in the code) to simpler independent standalone implementations that easily allow new implementations orthogonal to the current implementations. So it is not a good time to dive in.  <br>

<br>

    We are trying to do the refactorization but it is a bit frustrating and slow.<br>

<br>

     Can you tell us why you feel you need a custom implementation? Is the current implementation too slow (how do you know it is too slow?)?<br>

<br>

    Barry<br>

<br>

> On Sep 7, 2018, at 12:26 PM, Tamara Dancheva <<a href="mailto:tamaradanceva19933@gmail.com" target="_blank">tamaradanceva19933@gmail.com</a>> wrote:<br>

> <br>

> Hi,<br>

> <br>

> I am developing an asynchronous method for a FEM solver, and need a custom implementation of the VecScatterBegin and VecScatterEnd routines. Since PETSc uses its own limited set of MPI functions, could you tell what would be the best way possible to extend upon it and use for example the non-blocking collectives, igatherv and so on? <br>

> <br>

> I hope the question is specific enough, let me know if anything, I can provide with more information. I would very much appreciate any help, thanks in advance! <br>

> <br>

> Best,<br>

> Tamara<br>

<br>

</blockquote></div>