[petsc-dev] Jed!!!

Tue Apr 29 14:14:13 CDT 2014

Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> writes:
>  > You're supposed to use MatGetLocalSubMatrix().  Translating from global
>  > indices to local indices is a disaster that we want to avoid.  So we go
>  > the other way.  Speak the language of "split local" spaces during
>  > assembly and the data structure itself can live wherever is most
>  > efficient for the solver.
>
> We have a question about this:
>
> Where is the "disaster"?  In our mind, we think the  MatSetValues_Nest 
> should:
>
> 1) do the assembly for local lines in the proper submatrices

How do you determine which block owns a given column?  Note that the
column space is huge (can't store it) and arbitrary index sets can be
used to define the splits.  It is not scalable to gather this
problem-sized data to each process and it's a major headache to send the
indices to the process owning that column range to translate it for us.

> 2) communicate the non-local lines at MatAssemblyEnd() so that each 
> process receives some lines which will be assembled by method 1)
>
> So to do 1) (the assembly of local lines), you "just have to" convert 
> column indices to:
> a) retrieve which sub-matrices it belongs to
> b) for each sub-matrix, do the assembly in that sub-matrix which 
> involves converting lines/columns indices to -1 for elementary values 
> not in that matrix.
>
> We coded something (in sequential) in our code to do the work, and 
> observed surprisingly that it is faster to do the assembly in a nested 
> matrix:
> 3.97s vs 4.24s for a MatNest vs 357 369^2 CSR Matrix
>
> (we suppose it could be faster because of the dichotomous search done in 
> MatSetValues_SeqAIJ)

Or better memory locality during insertion.

> However, to do the correct work in parallel we think it should be 
> implemented directly into something like MatSetValues_Nest... Have we 
> missed something?
>
> The use case scenario is the following: we have a P2-hierarchical (for 
> example, a velocity field) which is made of a linear part and a 
> quadratic "correction" part.  For a specific preconditioner 
> (http://onlinelibrary.wiley.com/doi/10.1002/nla.757/abstract)
> we want to do the "normal" assembly then work on the submatrices which 
> are split by the linear vs quadratic parts of the velocity field.  The 
> assembly is done "normally" in our code, meaning we don't want to 
> duplicate the  computations done by the formulation (physical 
> properties) but instead we want to have the elementary matrix to be 
> split by the assembly functionality.

Do you choose a special ordering so that you can determine block
membership in a memory-scalable way?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140429/a9988c19/attachment.sig>