[petsc-dev] making DA more light weight

Barry Smith bsmith at mcs.anl.gov
Thu May 15 11:01:00 CDT 2014


On May 15, 2014, at 8:03 AM, Jed Brown <jed at jedbrown.org> wrote:

> Barry Smith <bsmith at mcs.anl.gov> writes:
>>   Hmm, why does VecScatter size depend on dof? Since it handles bs >
>>   1 it should not. Perhaps it is things like localtoglobal (not block
>>   version) that is taking the memory. Issues of collective
>>   construction come up if we make those produced on demand.
> 
> The ltogmap has one integer per dof, but the scatter uses more memory
> and is responsible for peak usage.
> 
> From the G+ thread:
> 
>  A nice tool for this is massif/massif_visualizer
> 
>  http://59A2.org/files/dmda-memory.png
> 
>  The memory is in two scatters: L2G defines the non-overlapping space
>  and G2L defines the overlapping space.  These could be built lazily,
>  but they have to be built collectively.  The memory spikes come from
>  getting indices for the blocked spaces.  This could be optimized in
>  VecScatter at the expense of slightly more special-case code.  

   Hmm, this is the sequential case where no optimization was done for block indices (adding additional code to handle the blocks would not be that difficult). In the parallel case if the indices are block then ISGetIndices() is not suppose to ever be used (is it?) instead only ISBlockGetIndices() is used. 

   Can this plot be produced for the parallel case?


> Anyway,
>  this is only relevant if you are not using matrices or Krylov, so
>  there hasn't been much demand in the past.  We can optimize further if
>  it is important.
> 
> 
> 
> We could make an ISLocalToGlobalMapping that stores only the blocks
> while translating scalar indices.  It would require an integer division,
> but you can do a lot of divisions for the cost of a cache miss so I
> would expect reasonable performance.  In any case, once you have the
> block version, the scalar version could be created non-collectively.

   Yup. This we should fix. 
> 
> But the ISGetIndices()

Hmm, it should not be in the parallel case! 

> and other allocations in VecScatterCreate()

Yes it would be good to quantify these “other allocations” 

> are
> the real killers, responsible for most of the memory usage and the peak
> usage in particular.

   

   Barry





More information about the petsc-dev mailing list