Hi All,<br><br>I can take charge of the development of this.  Ulisses, we can take this discussion offline.  The PETSc team only takes patches against their current working version, but I can work against an earlier PETSc if that is preferred.<br>

<br>Thanks,<br>Aron<br><br><div class="gmail_quote">On Wed, Aug 12, 2009 at 2:19 PM, Matthew Knepley <span dir="ltr">&lt;<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div><div></div><div class="h5">On Wed, Aug 12, 2009 at 1:01 PM, Lianjun An <span dir="ltr">&lt;<a href="mailto:alianjun@us.ibm.com" target="_blank">alianjun@us.ibm.com</a>&gt;</span> wrote:<br></div></div><div class="gmail_quote">

<div><div></div><div class="h5"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div>

<p>We currently use Petsc routine &quot;<tt><font size="4">AOCreateBasic</font></tt>&quot; to create application order --- a mapping from used id to petsc id. Then we setup a connection graph for our mesh and assemble vector and matrix based on the Petsc order. When total number of elements in our mesh is about couple million and 64 computer nodes are used, AO is created in a reasonable time. But when we have 700 million elements in the mesh and use more then 256 computer nodes, the time to create AO grows significantly (9000 secs). <br>


<br>

It seems that each process has whole permutation table about the mapping and quite expensive in time and buffer size.<br>

a) Memory requirement is quite high (AO creation failed for 1.3 billion elements on 1024 nodes due to not enough memory).<br>

b) If user id numbering on each processor is partially continuous, then time for creation AO is reduced.<br>

c) In my understanding, AOApplicationToPetsc and AOPetscToApplication routines don&#39;t use communication since AO has whole mapping on each processor.<br>

<br>

In our application, we might not need the whole mapping in each processor. We only need to ids of vertices that reside in the processor (limited ghost vertices). We wonder whether there is an efficient way to create AO and to cut its creation time and buffer size. Let us know your thought on that.</p>


</div></blockquote></div></div><div><br>The AO was not designed to be scalable (unfortunately). Any scalable version would segment the renumbering, so<br>that each process was responsible for a range of entries. You would first communicate the indices to  correct process<br>


(using a VecScatter), renumber, and then communicate the result back to the original process. It does not sound all<br>that hard to me (using contiguous indices), however we are unlikely to implement it right now. If you would like to<br>


try yourself, we would be willing to help.<br><br>   Matt<br></div></div><font color="#888888">-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>


-- Norbert Wiener<br>

</font></blockquote></div><br><br clear="all"><br>-- <br>Aron Jamil Ahmadia<br>Assistant Research Scientist<br>King Abdullah University of Science and Technology<br>