[petsc-users] Extremely slow DMNetwork Jacobian assembly

Wed May 8 17:53:29 CDT 2019

Hi everyone,

Ah I figured out what was wrong. It was an error on my part. My code was
unintentionally calling MatSetValues when it shouldn't have. I fixed the
problem and the speed of my code is as expected. In case anyone's
interested to know the grand number of lines of codes needed to fix this
error:
https://github.com/jychang48/petsc-dss/commit/d93c49c679af79c36613bc0456a79842060ac2f4

Thanks everyone for your help!
Justin

On Wed, May 8, 2019 at 4:16 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, May 8, 2019 at 6:10 PM Justin Chang <jychang48 at gmail.com> wrote:
>
>> Yeah I noticed that too. So could that possibly imply that the row/col
>> indices used in my MatSetValues() extend beyond what was originally
>> allocated in the DMCreateMatrix() function?
>>
>
> It definitely means that.
>
> We usually prevent this with a structured SetValues API. For example, DMDA
> uses MatSetValuesStencil() which cannot write
> outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is
> guaranteed to be allocated. We should write one
> for DMNetwork. The allocation is just like Plex (I believe) where you
> allocate closure(star(p)), which would mean that setting
> values for a vertex gets the neighboring edges and their vertices, and
> setting values for an edge gets the covering vertices.
> Is that right for DMNetwork?
>
>    Matt
>
>
>> On Wed, May 8, 2019 at 4:00 PM Dave May <dave.mayhem23 at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> So here's the branch/repo to the working example I have:
>>>>
>>>> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex
>>>>
>>>> Type 'make' to compile the dss, it should work with the latest petsc-dev
>>>>
>>>> To test the performance, I've taken an existing IEEE 13-bus and
>>>> duplicated it N times to create a long radial-like network. I have three
>>>> sizes where N = 100, 500, and 1000. Those test files are listed as:
>>>>
>>>> input/test_100.m
>>>> input/test_500.m
>>>> input/test_1000.m
>>>>
>>>> I also created another set of examples where the IEEE 13-bus is fully
>>>> balanced (but the program will crash ar the solve step because I used some
>>>> unrealistic parameters for the Y-bus matrices and probably have some zeros
>>>> somewhere). They are listed as:
>>>>
>>>> input/test2_100.m
>>>> input/test2_500.m
>>>> input/test2_1000.m
>>>>
>>>> The dof count and matrices for the test2_*.m files are slightly larger
>>>> than their respective test_*.m but they have a bs=6.
>>>>
>>>> To run these tests, type the following:
>>>>
>>>> ./dpflow -input input/test_100.m
>>>>
>>>> I have a timer that shows how long it takes to compute the Jacobian.
>>>> Attached are the log outputs I have for each of the six cases.
>>>>
>>>> Turns out that only the first call to the SNESComputeJacobian() is
>>>> slow, all the subsequent calls are fast as I expect. This makes me think it
>>>> still has something to do with matrix allocation.
>>>>
>>>
>>> I think it is a preallocation issue.
>>> Looking to some of the output files (test_1000.out, test_100.out), under
>>> Mat Object I see this in the KSPView
>>>
>>>       total number of mallocs used during MatSetValues calls =10000
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks for the help everyone,
>>>>
>>>> Justin
>>>>
>>>> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychang48 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Yes I have these lines in my code:
>>>>>>
>>>>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
>>>>>> ierr =
>>>>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);
>>>>>>
>>>>>
>>>>> Okay, its not allocation. So maybe Hong is right that its setting
>>>>> great big element matrices. We will see with the example.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>
>>>>>
>>>>>> I tried -info and here's my output:
>>>>>>
>>>>>> [0] PetscInitialize(): PETSc successfully started: number of
>>>>>> processors = 1
>>>>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain
>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608
>>>>>> 140550815662944 max tags = 2147483647
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436504608 140550815662944
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436504608 140550815662944
>>>>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000,
>>>>>> numdl = 5000, numlbr = 109999, numtbr = 5000
>>>>>>
>>>>>> **** Power flow dist case ****
>>>>>>
>>>>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000,
>>>>>> ndelta = 5000, nbranch = 114999
>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120
>>>>>> 140550815683104 max tags = 2147483647
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage
>>>>>> space: 0 unneeded,10799928 used
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
>>>>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used:
>>>>>> 5. Using Inode routines
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436504608 140550815662944
>>>>>> [0] DMGetDMSNES(): Creating new DMSNES
>>>>>> [0] DMGetDMKSP(): Creating new DMKSP
>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator
>>>>>> 4436505120 140550815683104
>>>>>>   0 SNES Function norm 1155.45
>>>>>>
>>>>>> nothing else -info related shows up as I'm iterating through the
>>>>>> vertex loop.
>>>>>>
>>>>>> I'll have a MWE for you guys to play with shortly.
>>>>>>
>>>>>> Thanks,
>>>>>> Justin
>>>>>>
>>>>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Justin,
>>>>>>>
>>>>>>>      Are you providing matrix entries that connect  directly one
>>>>>>> vertex to another vertex ACROSS an edge? I don't think that is supported by
>>>>>>> the DMNetwork model. The assumption is that edges are only connected to
>>>>>>> vertices and vertices are only connected to neighboring edges.
>>>>>>>
>>>>>>>   Everyone,
>>>>>>>
>>>>>>>   I second Matt's reply.
>>>>>>>
>>>>>>>   How is the DMNetwork preallocating for the Jacobian? Does it take
>>>>>>> into account coupling between neighboring vertices/edges? Or does it assume
>>>>>>> no coupling. Or assume full coupling. If it assumes no coupling and the
>>>>>>> user has a good amount of coupling it will be very slow.
>>>>>>>
>>>>>>>   There would need to be a way for the user provide the coupling
>>>>>>> information between neighboring vertices/edges if it assumes no coupling.
>>>>>>>
>>>>>>>     Barry
>>>>>>>
>>>>>>>
>>>>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <
>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>> >
>>>>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <
>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>> > Hi guys,
>>>>>>> >
>>>>>>> > I have a fully working distribution system solver written using
>>>>>>> DMNetwork, The idea is that each electrical bus can have up to three phase
>>>>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle.
>>>>>>> In a completely balanced system, each bus has three nodes, but in an
>>>>>>> unbalanced system some of the buses can be either single phase or two-phase.
>>>>>>> >
>>>>>>> > The working DMNetwork code I developed, loosely based on the SNES
>>>>>>> network/power.c, essentially represents each vertex as a bus.
>>>>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to
>>>>>>> each vertex. If every single bus had the same number of variables, the mat
>>>>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if
>>>>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my
>>>>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically,
>>>>>>> the MatSetValues() calls when the col/row global indices contain an offset
>>>>>>> value that points to a neighboring bus vertex.
>>>>>>> >
>>>>>>> > I have never seen MatSetValues() be slow unless it is allocating.
>>>>>>> Did you confirm that you are not allocating, with -info?
>>>>>>> >
>>>>>>> >   Thanks,
>>>>>>> >
>>>>>>> >      MAtt
>>>>>>> >
>>>>>>> > Why is that? Is it because I no longer have a uniform block
>>>>>>> structure and lose the speed/optimization benefits of iterating through an
>>>>>>> AIJ matrix? I see three potential workarounds:
>>>>>>> >
>>>>>>> > 1) Treat every vertex as a three phase bus and "zero out" all the
>>>>>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with
>>>>>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in
>>>>>>> the matrix). From the distribution systems I've seen, it's possible that
>>>>>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I
>>>>>>> may have nearly twice the amount of dofs than necessary if I wanted to
>>>>>>> preserve the block size = 6 for the AU mat.
>>>>>>> >
>>>>>>> > 2) Treat every phase node as a vertex aka solve a single-phase
>>>>>>> power flow solver. That way I guarantee to have a block size = 2, this is
>>>>>>> what Domenico's former student did in his thesis work. The problem I see
>>>>>>> with this is that I have a larger graph, which can take more time to setup
>>>>>>> and parallelize.
>>>>>>> >
>>>>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" -
>>>>>>> one for buses with all three phases, another for buses with only two
>>>>>>> phases, one for single-phase buses. This way each block/fieldsplit will
>>>>>>> have a consistent block size. I am not sure if this will solve the
>>>>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go
>>>>>>> about achieving this?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Justin
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> > -- Norbert Wiener
>>>>>>> >
>>>>>>> > https://www.cse.buffalo.edu/~knepley/
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190508/964f1b9a/attachment.html>