[petsc-users] Extremely slow DMNetwork Jacobian assembly

Smith, Barry F. bsmith at mcs.anl.gov
Thu May 9 00:19:36 CDT 2019



> On May 8, 2019, at 8:54 PM, Matthew Knepley via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> On Wed, May 8, 2019 at 9:00 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:
> Justin:
> Great, the issue is resolved.
> Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error?
> 
> Because it has PETSC_FALSE.

   Why would it EVER be set to false? Does DMNetwork preallocate appropriately? If not, why?

   Barry

>  
> Matt,
> 
> We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write
> outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one
> for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting
> values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices.
> Is that right for DMNetwork?
> Yes, DMNetwork behaves in this fashion. 
> I cannot find MatSetValuesClosure() in petsc-master. 
> 
> I mean https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexMatSetClosure.html
>  
> Can you provide detailed instruction on how to implement  MatSetValuesClosure() for DMNetwork?
> 
> It will just work as is for edges, but not for vertices since you want to set the star, not the closure. You would
> just need to reverse exactly what is in that function.
> 
>   Thanks,
> 
>     Matt
>  
> Note, dmnetwork is a subclass of DMPlex.
> 
> Hong
> 
>  
> On Wed, May 8, 2019 at 4:00 PM Dave May <dave.mayhem23 at gmail.com> wrote:
> 
> 
> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users <petsc-users at mcs.anl.gov> wrote:
> So here's the branch/repo to the working example I have:
> 
> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex
> 
> Type 'make' to compile the dss, it should work with the latest petsc-dev
> 
> To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as:
> 
> input/test_100.m
> input/test_500.m
> input/test_1000.m
> 
> I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as:
> 
> input/test2_100.m
> input/test2_500.m
> input/test2_1000.m
> 
> The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6.
> 
> To run these tests, type the following:
> 
> ./dpflow -input input/test_100.m 
> 
> I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases.
> 
> Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation.
> 
> I think it is a preallocation issue.
> Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView
> 
>       total number of mallocs used during MatSetValues calls =10000
> 
> 
> 
>  
> 
> Thanks for the help everyone,
> 
> Justin
> 
> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychang48 at gmail.com> wrote:
> Hi everyone,
> 
> Yes I have these lines in my code:
> 
> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
> ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);
> 
> Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example.
> 
>   Thanks,
> 
>     Matt
>  
> I tried -info and here's my output:
> 
> [0] PetscInitialize(): PETSc successfully started: number of processors = 1
> [0] PetscInitialize(): Running on machine: jchang31606s.domain
> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944
> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000
> 
> **** Power flow dist case ****
> 
> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999
> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines.
> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944
> [0] DMGetDMSNES(): Creating new DMSNES
> [0] DMGetDMKSP(): Creating new DMKSP
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104
>   0 SNES Function norm 1155.45 
> 
> nothing else -info related shows up as I'm iterating through the vertex loop.
> 
> I'll have a MWE for you guys to play with shortly.
> 
> Thanks,
> Justin
> 
> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   Justin,
> 
>      Are you providing matrix entries that connect  directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges.
> 
>   Everyone,
> 
>   I second Matt's reply. 
> 
>   How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. 
> 
>   There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling.
> 
>     Barry
> 
> 
> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > 
> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > Hi guys,
> > 
> > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase.
> > 
> > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. 
> > 
> > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info?
> > 
> >   Thanks,
> > 
> >      MAtt
> >  
> > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds:
> > 
> > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that  anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat.
> > 
> > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize.
> > 
> > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this?
> > 
> > Thanks,
> > Justin
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



More information about the petsc-users mailing list