[petsc-users] Extremely slow DMNetwork Jacobian assembly

Wed May 8 17:16:00 CDT 2019

On Wed, May 8, 2019 at 6:10 PM Justin Chang <jychang48 at gmail.com> wrote:

> Yeah I noticed that too. So could that possibly imply that the row/col
> indices used in my MatSetValues() extend beyond what was originally
> allocated in the DMCreateMatrix() function?
>

It definitely means that.

We usually prevent this with a structured SetValues API. For example, DMDA
uses MatSetValuesStencil() which cannot write
outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is
guaranteed to be allocated. We should write one
for DMNetwork. The allocation is just like Plex (I believe) where you
allocate closure(star(p)), which would mean that setting
values for a vertex gets the neighboring edges and their vertices, and
setting values for an edge gets the covering vertices.
Is that right for DMNetwork?

   Matt

> On Wed, May 8, 2019 at 4:00 PM Dave May <dave.mayhem23 at gmail.com> wrote:
>
>>
>>
>> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> So here's the branch/repo to the working example I have:
>>>
>>> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex
>>>
>>> Type 'make' to compile the dss, it should work with the latest petsc-dev
>>>
>>> To test the performance, I've taken an existing IEEE 13-bus and
>>> duplicated it N times to create a long radial-like network. I have three
>>> sizes where N = 100, 500, and 1000. Those test files are listed as:
>>>
>>> input/test_100.m
>>> input/test_500.m
>>> input/test_1000.m
>>>
>>> I also created another set of examples where the IEEE 13-bus is fully
>>> balanced (but the program will crash ar the solve step because I used some
>>> unrealistic parameters for the Y-bus matrices and probably have some zeros
>>> somewhere). They are listed as:
>>>
>>> input/test2_100.m
>>> input/test2_500.m
>>> input/test2_1000.m
>>>
>>> The dof count and matrices for the test2_*.m files are slightly larger
>>> than their respective test_*.m but they have a bs=6.
>>>
>>> To run these tests, type the following:
>>>
>>> ./dpflow -input input/test_100.m
>>>
>>> I have a timer that shows how long it takes to compute the Jacobian.
>>> Attached are the log outputs I have for each of the six cases.
>>>
>>> Turns out that only the first call to the SNESComputeJacobian() is slow,
>>> all the subsequent calls are fast as I expect. This makes me think it still
>>> has something to do with matrix allocation.
>>>
>>
>> I think it is a preallocation issue.
>> Looking to some of the output files (test_1000.out, test_100.out), under
>> Mat Object I see this in the KSPView
>>
>>       total number of mallocs used during MatSetValues calls =10000
>>
>>
>>
>>
>>
>>>
>>> Thanks for the help everyone,
>>>
>>> Justin
>>>
>>> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychang48 at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Yes I have these lines in my code:
>>>>>
>>>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
>>>>> ierr =
>>>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);
>>>>>
>>>>
>>>> Okay, its not allocation. So maybe Hong is right that its setting great
>>>> big element matrices. We will see with the example.
>>>>
>>>>   Thanks,
>>>>
>>>>     Matt
>>>>
>>>>
>>>>> I tried -info and here's my output:
>>>>>
>>>>> [0] PetscInitialize(): PETSc successfully started: number of
>>>>> processors = 1
>>>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain
>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608
>>>>> 140550815662944 max tags = 2147483647
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
>>>>> 140550815662944
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
>>>>> 140550815662944
>>>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000,
>>>>> numdl = 5000, numlbr = 109999, numtbr = 5000
>>>>>
>>>>> **** Power flow dist case ****
>>>>>
>>>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000,
>>>>> ndelta = 5000, nbranch = 114999
>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120
>>>>> 140550815683104 max tags = 2147483647
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage
>>>>> space: 0 unneeded,10799928 used
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
>>>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used:
>>>>> 5. Using Inode routines
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
>>>>> 140550815662944
>>>>> [0] DMGetDMSNES(): Creating new DMSNES
>>>>> [0] DMGetDMKSP(): Creating new DMKSP
>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>>>> 140550815683104
>>>>>   0 SNES Function norm 1155.45
>>>>>
>>>>> nothing else -info related shows up as I'm iterating through the
>>>>> vertex loop.
>>>>>
>>>>> I'll have a MWE for you guys to play with shortly.
>>>>>
>>>>> Thanks,
>>>>> Justin
>>>>>
>>>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>   Justin,
>>>>>>
>>>>>>      Are you providing matrix entries that connect  directly one
>>>>>> vertex to another vertex ACROSS an edge? I don't think that is supported by
>>>>>> the DMNetwork model. The assumption is that edges are only connected to
>>>>>> vertices and vertices are only connected to neighboring edges.
>>>>>>
>>>>>>   Everyone,
>>>>>>
>>>>>>   I second Matt's reply.
>>>>>>
>>>>>>   How is the DMNetwork preallocating for the Jacobian? Does it take
>>>>>> into account coupling between neighboring vertices/edges? Or does it assume
>>>>>> no coupling. Or assume full coupling. If it assumes no coupling and the
>>>>>> user has a good amount of coupling it will be very slow.
>>>>>>
>>>>>>   There would need to be a way for the user provide the coupling
>>>>>> information between neighboring vertices/edges if it assumes no coupling.
>>>>>>
>>>>>>     Barry
>>>>>>
>>>>>>
>>>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>> >
>>>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>> > Hi guys,
>>>>>> >
>>>>>> > I have a fully working distribution system solver written using
>>>>>> DMNetwork, The idea is that each electrical bus can have up to three phase
>>>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle.
>>>>>> In a completely balanced system, each bus has three nodes, but in an
>>>>>> unbalanced system some of the buses can be either single phase or two-phase.
>>>>>> >
>>>>>> > The working DMNetwork code I developed, loosely based on the SNES
>>>>>> network/power.c, essentially represents each vertex as a bus.
>>>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to
>>>>>> each vertex. If every single bus had the same number of variables, the mat
>>>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if
>>>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my
>>>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically,
>>>>>> the MatSetValues() calls when the col/row global indices contain an offset
>>>>>> value that points to a neighboring bus vertex.
>>>>>> >
>>>>>> > I have never seen MatSetValues() be slow unless it is allocating.
>>>>>> Did you confirm that you are not allocating, with -info?
>>>>>> >
>>>>>> >   Thanks,
>>>>>> >
>>>>>> >      MAtt
>>>>>> >
>>>>>> > Why is that? Is it because I no longer have a uniform block
>>>>>> structure and lose the speed/optimization benefits of iterating through an
>>>>>> AIJ matrix? I see three potential workarounds:
>>>>>> >
>>>>>> > 1) Treat every vertex as a three phase bus and "zero out" all the
>>>>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with
>>>>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in
>>>>>> the matrix). From the distribution systems I've seen, it's possible that
>>>>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I
>>>>>> may have nearly twice the amount of dofs than necessary if I wanted to
>>>>>> preserve the block size = 6 for the AU mat.
>>>>>> >
>>>>>> > 2) Treat every phase node as a vertex aka solve a single-phase
>>>>>> power flow solver. That way I guarantee to have a block size = 2, this is
>>>>>> what Domenico's former student did in his thesis work. The problem I see
>>>>>> with this is that I have a larger graph, which can take more time to setup
>>>>>> and parallelize.
>>>>>> >
>>>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" -
>>>>>> one for buses with all three phases, another for buses with only two
>>>>>> phases, one for single-phase buses. This way each block/fieldsplit will
>>>>>> have a consistent block size. I am not sure if this will solve the
>>>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go
>>>>>> about achieving this?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Justin
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> > -- Norbert Wiener
>>>>>> >
>>>>>> > https://www.cse.buffalo.edu/~knepley/
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190508/3216c58e/attachment-0001.html>