[petsc-users] Extremely slow DMNetwork Jacobian assembly

Wed May 8 17:00:40 CDT 2019

On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> So here's the branch/repo to the working example I have:
>
> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex
>
> Type 'make' to compile the dss, it should work with the latest petsc-dev
>
> To test the performance, I've taken an existing IEEE 13-bus and duplicated
> it N times to create a long radial-like network. I have three sizes where N
> = 100, 500, and 1000. Those test files are listed as:
>
> input/test_100.m
> input/test_500.m
> input/test_1000.m
>
> I also created another set of examples where the IEEE 13-bus is fully
> balanced (but the program will crash ar the solve step because I used some
> unrealistic parameters for the Y-bus matrices and probably have some zeros
> somewhere). They are listed as:
>
> input/test2_100.m
> input/test2_500.m
> input/test2_1000.m
>
> The dof count and matrices for the test2_*.m files are slightly larger
> than their respective test_*.m but they have a bs=6.
>
> To run these tests, type the following:
>
> ./dpflow -input input/test_100.m
>
> I have a timer that shows how long it takes to compute the Jacobian.
> Attached are the log outputs I have for each of the six cases.
>
> Turns out that only the first call to the SNESComputeJacobian() is slow,
> all the subsequent calls are fast as I expect. This makes me think it still
> has something to do with matrix allocation.
>

I think it is a preallocation issue.
Looking to some of the output files (test_1000.out, test_100.out), under
Mat Object I see this in the KSPView

      total number of mallocs used during MatSetValues calls =10000

>
> Thanks for the help everyone,
>
> Justin
>
> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychang48 at gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Yes I have these lines in my code:
>>>
>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
>>> ierr =
>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);
>>>
>>
>> Okay, its not allocation. So maybe Hong is right that its setting great
>> big element matrices. We will see with the example.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> I tried -info and here's my output:
>>>
>>> [0] PetscInitialize(): PETSc successfully started: number of processors
>>> = 1
>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain
>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608
>>> 140550815662944 max tags = 2147483647
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
>>> 140550815662944
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
>>> 140550815662944
>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000,
>>> numdl = 5000, numlbr = 109999, numtbr = 5000
>>>
>>> **** Power flow dist case ****
>>>
>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta
>>> = 5000, nbranch = 114999
>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120
>>> 140550815683104 max tags = 2147483647
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage
>>> space: 0 unneeded,10799928 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5.
>>> Using Inode routines
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
>>> 140550815662944
>>> [0] DMGetDMSNES(): Creating new DMSNES
>>> [0] DMGetDMKSP(): Creating new DMKSP
>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
>>> 140550815683104
>>>   0 SNES Function norm 1155.45
>>>
>>> nothing else -info related shows up as I'm iterating through the vertex
>>> loop.
>>>
>>> I'll have a MWE for you guys to play with shortly.
>>>
>>> Thanks,
>>> Justin
>>>
>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>>
>>>>
>>>>   Justin,
>>>>
>>>>      Are you providing matrix entries that connect  directly one vertex
>>>> to another vertex ACROSS an edge? I don't think that is supported by the
>>>> DMNetwork model. The assumption is that edges are only connected to
>>>> vertices and vertices are only connected to neighboring edges.
>>>>
>>>>   Everyone,
>>>>
>>>>   I second Matt's reply.
>>>>
>>>>   How is the DMNetwork preallocating for the Jacobian? Does it take
>>>> into account coupling between neighboring vertices/edges? Or does it assume
>>>> no coupling. Or assume full coupling. If it assumes no coupling and the
>>>> user has a good amount of coupling it will be very slow.
>>>>
>>>>   There would need to be a way for the user provide the coupling
>>>> information between neighboring vertices/edges if it assumes no coupling.
>>>>
>>>>     Barry
>>>>
>>>>
>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>> >
>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>> > Hi guys,
>>>> >
>>>> > I have a fully working distribution system solver written using
>>>> DMNetwork, The idea is that each electrical bus can have up to three phase
>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle.
>>>> In a completely balanced system, each bus has three nodes, but in an
>>>> unbalanced system some of the buses can be either single phase or two-phase.
>>>> >
>>>> > The working DMNetwork code I developed, loosely based on the SNES
>>>> network/power.c, essentially represents each vertex as a bus.
>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to
>>>> each vertex. If every single bus had the same number of variables, the mat
>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if
>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my
>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically,
>>>> the MatSetValues() calls when the col/row global indices contain an offset
>>>> value that points to a neighboring bus vertex.
>>>> >
>>>> > I have never seen MatSetValues() be slow unless it is allocating. Did
>>>> you confirm that you are not allocating, with -info?
>>>> >
>>>> >   Thanks,
>>>> >
>>>> >      MAtt
>>>> >
>>>> > Why is that? Is it because I no longer have a uniform block structure
>>>> and lose the speed/optimization benefits of iterating through an AIJ
>>>> matrix? I see three potential workarounds:
>>>> >
>>>> > 1) Treat every vertex as a three phase bus and "zero out" all the
>>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with
>>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in
>>>> the matrix). From the distribution systems I've seen, it's possible that
>>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I
>>>> may have nearly twice the amount of dofs than necessary if I wanted to
>>>> preserve the block size = 6 for the AU mat.
>>>> >
>>>> > 2) Treat every phase node as a vertex aka solve a single-phase power
>>>> flow solver. That way I guarantee to have a block size = 2, this is what
>>>> Domenico's former student did in his thesis work. The problem I see with
>>>> this is that I have a larger graph, which can take more time to setup and
>>>> parallelize.
>>>> >
>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" -
>>>> one for buses with all three phases, another for buses with only two
>>>> phases, one for single-phase buses. This way each block/fieldsplit will
>>>> have a consistent block size. I am not sure if this will solve the
>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go
>>>> about achieving this?
>>>> >
>>>> > Thanks,
>>>> > Justin
>>>> >
>>>> >
>>>> > --
>>>> > What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> > -- Norbert Wiener
>>>> >
>>>> > https://www.cse.buffalo.edu/~knepley/
>>>>
>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190508/44a8d8ce/attachment-0001.html>