[petsc-users] Extremely slow DMNetwork Jacobian assembly

Wed May 8 13:36:43 CDT 2019

On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychang48 at gmail.com> wrote:

> Hi everyone,
>
> Yes I have these lines in my code:
>
> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
> ierr =
> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);
>

Okay, its not allocation. So maybe Hong is right that its setting great big
element matrices. We will see with the example.

  Thanks,

    Matt

> I tried -info and here's my output:
>
> [0] PetscInitialize(): PETSc successfully started: number of processors = 1
> [0] PetscInitialize(): Running on machine: jchang31606s.domain
> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608
> 140550815662944 max tags = 2147483647
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
> 140550815662944
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
> 140550815662944
> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000,
> numdl = 5000, numlbr = 109999, numtbr = 5000
>
> **** Power flow dist case ****
>
> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta =
> 5000, nbranch = 114999
> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120
> 140550815683104 max tags = 2147483647
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space:
> 0 unneeded,10799928 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5.
> Using Inode routines
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
> 140550815662944
> [0] DMGetDMSNES(): Creating new DMSNES
> [0] DMGetDMKSP(): Creating new DMKSP
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
> 140550815683104
>   0 SNES Function norm 1155.45
>
> nothing else -info related shows up as I'm iterating through the vertex
> loop.
>
> I'll have a MWE for you guys to play with shortly.
>
> Thanks,
> Justin
>
> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>   Justin,
>>
>>      Are you providing matrix entries that connect  directly one vertex
>> to another vertex ACROSS an edge? I don't think that is supported by the
>> DMNetwork model. The assumption is that edges are only connected to
>> vertices and vertices are only connected to neighboring edges.
>>
>>   Everyone,
>>
>>   I second Matt's reply.
>>
>>   How is the DMNetwork preallocating for the Jacobian? Does it take into
>> account coupling between neighboring vertices/edges? Or does it assume no
>> coupling. Or assume full coupling. If it assumes no coupling and the user
>> has a good amount of coupling it will be very slow.
>>
>>   There would need to be a way for the user provide the coupling
>> information between neighboring vertices/edges if it assumes no coupling.
>>
>>     Barry
>>
>>
>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> > Hi guys,
>> >
>> > I have a fully working distribution system solver written using
>> DMNetwork, The idea is that each electrical bus can have up to three phase
>> nodes, and each phase node has two unknowns: voltage magnitude and angle.
>> In a completely balanced system, each bus has three nodes, but in an
>> unbalanced system some of the buses can be either single phase or two-phase.
>> >
>> > The working DMNetwork code I developed, loosely based on the SNES
>> network/power.c, essentially represents each vertex as a bus.
>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to
>> each vertex. If every single bus had the same number of variables, the mat
>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if
>> the unknowns per DMNetwork vertex unknowns are not the same across, then my
>> SNESFormJacobian function becomes extremely extremely slow. Specifically,
>> the MatSetValues() calls when the col/row global indices contain an offset
>> value that points to a neighboring bus vertex.
>> >
>> > I have never seen MatSetValues() be slow unless it is allocating. Did
>> you confirm that you are not allocating, with -info?
>> >
>> >   Thanks,
>> >
>> >      MAtt
>> >
>> > Why is that? Is it because I no longer have a uniform block structure
>> and lose the speed/optimization benefits of iterating through an AIJ
>> matrix? I see three potential workarounds:
>> >
>> > 1) Treat every vertex as a three phase bus and "zero out" all the
>> unused phase node dofs and put a 1 in the diagonal. The problem I see with
>> this is that I will have unnecessary degrees of freedom (aka non-zeros in
>> the matrix). From the distribution systems I've seen, it's possible that
>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I
>> may have nearly twice the amount of dofs than necessary if I wanted to
>> preserve the block size = 6 for the AU mat.
>> >
>> > 2) Treat every phase node as a vertex aka solve a single-phase power
>> flow solver. That way I guarantee to have a block size = 2, this is what
>> Domenico's former student did in his thesis work. The problem I see with
>> this is that I have a larger graph, which can take more time to setup and
>> parallelize.
>> >
>> > 3) Create a "fieldsplit" where I essentially have three "blocks" - one
>> for buses with all three phases, another for buses with only two phases,
>> one for single-phase buses. This way each block/fieldsplit will have a
>> consistent block size. I am not sure if this will solve the MatSetValues()
>> issues, but it's, but can anyone give pointers on how to go about achieving
>> this?
>> >
>> > Thanks,
>> > Justin
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190508/7b44a9d8/attachment-0001.html>