[petsc-users] Extremely slow DMNetwork Jacobian assembly

Wed May 8 13:29:40 CDT 2019

Hi everyone,

Yes I have these lines in my code:

ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
ierr =
MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);

I tried -info and here's my output:

[0] PetscInitialize(): PETSc successfully started: number of processors = 1
[0] PetscInitialize(): Running on machine: jchang31606s.domain
[0] PetscCommDuplicate(): Duplicating a communicator 4436504608
140550815662944 max tags = 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
140550815662944
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
140550815662944
Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl
= 5000, numlbr = 109999, numtbr = 5000

**** Power flow dist case ****

Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta =
5000, nbranch = 114999
[0] PetscCommDuplicate(): Duplicating a communicator 4436505120
140550815683104 max tags = 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0
unneeded,10799928 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
[0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5.
Using Inode routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
140550815662944
[0] DMGetDMSNES(): Creating new DMSNES
[0] DMGetDMKSP(): Creating new DMKSP
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
  0 SNES Function norm 1155.45

nothing else -info related shows up as I'm iterating through the vertex
loop.

I'll have a MWE for you guys to play with shortly.

Thanks,
Justin

On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Justin,
>
>      Are you providing matrix entries that connect  directly one vertex to
> another vertex ACROSS an edge? I don't think that is supported by the
> DMNetwork model. The assumption is that edges are only connected to
> vertices and vertices are only connected to neighboring edges.
>
>   Everyone,
>
>   I second Matt's reply.
>
>   How is the DMNetwork preallocating for the Jacobian? Does it take into
> account coupling between neighboring vertices/edges? Or does it assume no
> coupling. Or assume full coupling. If it assumes no coupling and the user
> has a good amount of coupling it will be very slow.
>
>   There would need to be a way for the user provide the coupling
> information between neighboring vertices/edges if it assumes no coupling.
>
>     Barry
>
>
> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > Hi guys,
> >
> > I have a fully working distribution system solver written using
> DMNetwork, The idea is that each electrical bus can have up to three phase
> nodes, and each phase node has two unknowns: voltage magnitude and angle.
> In a completely balanced system, each bus has three nodes, but in an
> unbalanced system some of the buses can be either single phase or two-phase.
> >
> > The working DMNetwork code I developed, loosely based on the SNES
> network/power.c, essentially represents each vertex as a bus.
> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to
> each vertex. If every single bus had the same number of variables, the mat
> block size = 2, 4, or 6, and my code is both fast and scalable. However, if
> the unknowns per DMNetwork vertex unknowns are not the same across, then my
> SNESFormJacobian function becomes extremely extremely slow. Specifically,
> the MatSetValues() calls when the col/row global indices contain an offset
> value that points to a neighboring bus vertex.
> >
> > I have never seen MatSetValues() be slow unless it is allocating. Did
> you confirm that you are not allocating, with -info?
> >
> >   Thanks,
> >
> >      MAtt
> >
> > Why is that? Is it because I no longer have a uniform block structure
> and lose the speed/optimization benefits of iterating through an AIJ
> matrix? I see three potential workarounds:
> >
> > 1) Treat every vertex as a three phase bus and "zero out" all the unused
> phase node dofs and put a 1 in the diagonal. The problem I see with this is
> that I will have unnecessary degrees of freedom (aka non-zeros in the
> matrix). From the distribution systems I've seen, it's possible that
> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I
> may have nearly twice the amount of dofs than necessary if I wanted to
> preserve the block size = 6 for the AU mat.
> >
> > 2) Treat every phase node as a vertex aka solve a single-phase power
> flow solver. That way I guarantee to have a block size = 2, this is what
> Domenico's former student did in his thesis work. The problem I see with
> this is that I have a larger graph, which can take more time to setup and
> parallelize.
> >
> > 3) Create a "fieldsplit" where I essentially have three "blocks" - one
> for buses with all three phases, another for buses with only two phases,
> one for single-phase buses. This way each block/fieldsplit will have a
> consistent block size. I am not sure if this will solve the MatSetValues()
> issues, but it's, but can anyone give pointers on how to go about achieving
> this?
> >
> > Thanks,
> > Justin
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190508/955223b7/attachment.html>