[petsc-users] Extremely slow DMNetwork Jacobian assembly

Wed May 8 21:53:12 CDT 2019

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, May 8, 2019 at 9:29 PM
To: "Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
Cc: "Zhang, Hong" <hzhang at mcs.anl.gov>, Justin Chang <jychang48 at gmail.com>, petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly

On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of "Zhang, Hong via petsc-users" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Reply-To: "Zhang, Hong" <hzhang at mcs.anl.gov<mailto:hzhang at mcs.anl.gov>>
Date: Wednesday, May 8, 2019 at 8:01 PM
To: Justin Chang <jychang48 at gmail.com<mailto:jychang48 at gmail.com>>
Cc: petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly

Justin:
Great, the issue is resolved.
Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error?

I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option.

Matt,

We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write
outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one
for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting
values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices.
Is that right for DMNetwork?
Yes, DMNetwork behaves in this fashion.
I cannot find MatSetValuesClosure() in petsc-master.
Can you provide detailed instruction on how to implement  MatSetValuesClosure() for DMNetwork?
Note, dmnetwork is a subclass of DMPlex.

DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix.

Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need
a MatSetStar() for DMNetwork as well.

I see that Hong has implemented has some additional code in DMCreateMatrix_Network that does not use Plex for creating the matrix.  I think it covers what you describe above.

Hong: DMNetwork matrix creation code is not used unless the user wants to set special sparsity pattern for the blocks. Shouldn’t this code be used by default instead of having Plex create the matrix?

Shri

   Matt

Hong

On Wed, May 8, 2019 at 4:00 PM Dave May <dave.mayhem23 at gmail.com<mailto:dave.mayhem23 at gmail.com>> wrote:

On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
So here's the branch/repo to the working example I have:

https://github.com/jychang48/petsc-dss/tree/single-bus-vertex

Type 'make' to compile the dss, it should work with the latest petsc-dev

To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as:

input/test_100.m
input/test_500.m
input/test_1000.m

I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as:

input/test2_100.m
input/test2_500.m
input/test2_1000.m

The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6.

To run these tests, type the following:

./dpflow -input input/test_100.m

I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases.

Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation.

I think it is a preallocation issue.
Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView

      total number of mallocs used during MatSetValues calls =10000

Thanks for the help everyone,

Justin

On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:
On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychang48 at gmail.com<mailto:jychang48 at gmail.com>> wrote:
Hi everyone,

Yes I have these lines in my code:

ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);

Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example.

  Thanks,

    Matt

I tried -info and here's my output:

[0] PetscInitialize(): PETSc successfully started: number of processors = 1
[0] PetscInitialize(): Running on machine: jchang31606s.domain
[0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944
Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000

**** Power flow dist case ****

Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999
[0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
[0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944
[0] DMGetDMSNES(): Creating new DMSNES
[0] DMGetDMKSP(): Creating new DMKSP
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
  0 SNES Function norm 1155.45

nothing else -info related shows up as I'm iterating through the vertex loop.

I'll have a MWE for you guys to play with shortly.

Thanks,
Justin

On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

  Justin,

     Are you providing matrix entries that connect  directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges.

  Everyone,

  I second Matt's reply.

  How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow.

  There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling.

    Barry

> On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
>
> On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
> Hi guys,
>
> I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase.
>
> The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex.
>
> I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info?
>
>   Thanks,
>
>      MAtt
>
> Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds:
>
> 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that  anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat.
>
> 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize.
>
> 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this?
>
> Thanks,
> Justin
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190509/70b1f775/attachment-0001.html>