[petsc-dev] making DA more light weight

Thu May 15 16:52:30 CDT 2014

> Note that ISGetIndices is still called in the parallel case.

   When bs is set > 1 it is not called in the VecScatterCreate() inside the DMDACreate 2d/3d ! Only ISBlockGetIndices() is called. The memory usage of VecScatterCreate() is O(dof*number of ghost points) + O(vector length/dof) in this case

  It is called MULTIPLY times in the ISLocalToGlobalMappingCreateIS() inside the DMDACreate 2d/3d thus currently this causes small integer * vector length usage. As I keep telling you this is the problem area for dof > 1.   For dof == 1 I can live with 2.5 *sizeof (vector)

   You need a better test code to run with. Can you use ksp/ksp/examples/tests/ex42.c ? Or snes/examples/tutorials/ex19.c in parallel?

   ISGetIndices() is called in VecScatterCreate() inside MatSetUpMultiply_MPIAIJ() because the off diagonal portion of the matrix is in generally not blocked.

   Barry

On May 15, 2014, at 2:27 PM, Jed Brown <jed at jedbrown.org> wrote:

> Barry Smith <bsmith at mcs.anl.gov> writes:
>>   Hmm, this is the sequential case where no optimization was done for
>>   block indices (adding additional code to handle the blocks would
>>   not be that difficult). In the parallel case if the indices are
>>   block then ISGetIndices() is not suppose to ever be used (is it?)
>>   instead only ISBlockGetIndices() is used.
>> 
>>   Can this plot be produced for the parallel case?
> 
> $ PETSC_ARCH=mpich-opt mpirun.hydra -n 1 python2 -m memory_profiler ketch-dmda.py 128 3 1 : -n 1 python2 ketch-dmda.py 128 3 1
> Filename: ketch-dmda.py
> 
> Line #    Mem usage    Increment   Line Contents
> ================================================
>    11   23.336 MiB    0.000 MiB   @profile
>    12                             def foo(size=128,ndim=3,dof=1):
>    13   51.688 MiB   28.352 MiB       da = PETSc.DA().create(sizes=[size]*ndim,dof=dof)
>    14   59.711 MiB    8.023 MiB       q1 = da.createGlobalVec()
>    15   67.715 MiB    8.004 MiB       q2 = da.createGlobalVec()
>    16   75.719 MiB    8.004 MiB       q3 = da.createGlobalVec()
> 
> $ PETSC_ARCH=mpich-opt mpirun.hydra -n 1 python2 -m memory_profiler ketch-dmda.py 128 3 10 : -n 1 python2 ketch-dmda.py 128 3 10
> Filename: ketch-dmda.py
> 
> Line #    Mem usage    Increment   Line Contents
> ================================================
>    11   23.336 MiB    0.000 MiB   @profile
>    12                             def foo(size=128,ndim=3,dof=1):
>    13  235.711 MiB  212.375 MiB       da = PETSc.DA().create(sizes=[size]*ndim,dof=dof)
>    14  315.734 MiB   80.023 MiB       q1 = da.createGlobalVec()
>    15  395.738 MiB   80.004 MiB       q2 = da.createGlobalVec()
>    16  475.742 MiB   80.004 MiB       q3 = da.createGlobalVec()
> 
> 
> So creating the DMDA still costs 2.5x as much as a Vec.  See here for
> the massif-visualizer plot:
> 
>  http://59A2.org/files/dmda-memory-p2.png
> 
>  $ PETSC_ARCH=mpich-opt mpirun.hydra -n 1 valgrind --tool=massif python2 ketch-dmda.py 128 3 10 : -n 1 python2 ketch-dmda.py 128 3 10                                                                         
>  ==3243== Massif, a heap profiler
>  ==3243== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
>  ==3243== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
>  ==3243== Command: python2 ketch-dmda.py 128 3 10
>  ==3243== 
>  ==3243== 
> 
> Note that ISGetIndices is still called in the parallel case.
> 
> <ketch-dmda.py>