<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I managed to get this to work. <div class=""><br class=""></div><div class="">I defined a larger matrix with the dense blocks appended to the end of the matrix on the last processor. Currently, I am only running with one extra unknown, so this should not be a significant penalty for load balancing. </div><div class=""><br class=""></div><div class="">Since the larger matrix has the same I-j locations for the FE non-zeros, I use it directly in the FE assembly. </div><div class=""><br class=""></div><div class="">I have tested with parallel MUMPS solves and it working smoothly. Also, the monolithic system removes the issue with the singularity of J_fe at/near the bifurcation point. </div><div class=""><br class=""></div><div class="">Next, I would like to figure out if there are ways to bring in iterative solvers to solve this more efficiently. My J_fe comes from a nonlinear shell deformation problem with snap through response. </div><div class=""><br class=""></div><div class="">I am not sure if it would make sense to use an AMG solver on this monolithic matrix, as opposed to using it as a preconditioner for J_fe in the Schur-factorization approach. The LOCA solver in Trillions was able to find some success with the latter approach: <a href="https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508" class="">https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508</a></div><div class=""><br class=""></div><div class="">I would appreciate any general thoughts concerning this. </div><div class=""><div class=""><br class=""></div><div class="">Regards,</div><div class="">Manav</div><div class=""><br class=""></div><div class=""><div><br class=""><blockquote type="cite" class=""><div class="">On May 29, 2019, at 9:11 PM, Manav Bhatia <<a href="mailto:bhatiamanav@gmail.com" class="">bhatiamanav@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Barry, <div class=""><br class=""></div><div class=""> Thanks for the detailed message. </div><div class=""><br class=""></div><div class=""> I checked libMesh’s continuation sovler and it appears to be using the same system solver without creating a larger matrix: <a href="https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C" class="">https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C</a> <div class=""><br class=""></div><div class=""> I need to implement this in my code, MAST, for various reasons (mainly, it fits inside a bigger workflow). The current implementation implementation follows the Schur factorization approach: <a href="https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details" class="">https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details</a> </div><div class=""><br class=""></div><div class=""> I will look into some solutions pertaining to the use of MatGetLocalSubMatrix or leverage some existing functionality in libMesh. </div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Manav</div><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On May 29, 2019, at 7:04 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" class="">bsmith@mcs.anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><br class=""> Understood. Where are you putting the "few extra unknowns" in the vector and matrix? On the first process, on the last process, some places in the middle of the matrix?<br class=""><br class=""> We don't have any trivial code for copying a big matrix into a even larger matrix directly because we frown on doing that. It is very wasteful in time and memory. <br class=""><br class=""> The simplest way to do it is call MatGetRow() twice for each row, once to get the nonzero locations for each row to determine the numbers needed for preallocation and then the second time after the big matrix has been preallocated to get the nonzero locations and numerical values for the row to call MatSetValues() with to set that row into the bigger matrix. Note of course when you call MatSetValues() you will need to shift the rows and column locations to take into account the new rows and columns in the bigger matrix. If you put the "extra unknowns" at the every end of the rows/columns on the last process you won't have to shift.<br class=""><br class=""> Note that B being dense really messes up chances for load balancing since its rows are dense and take a great deal of space so whatever process gets those rows needs to have much less of the mesh. <br class=""><br class=""> The correct long term approach is to have libmesh provide the needed functionality (for continuation) for the slightly larger matrix directly so huge matrices do not need to be copied.<br class=""><br class=""> I noticed that libmesh has some functionality related to continuation. I do not know if they handle it by creating the larger matrix and vector and filling that up directly for finite elements. If they do then you should definitely take a look at that and see if it can be extended for your case (ignore the continuation algorithm they may be using, that is not relevant, the question is if they generate the larger matrices and if you can leverage this).<br class=""><br class=""><br class=""> The ultimate hack would be to (for example) assign the extra variables to the end of the last process and hack lib mesh a little bit so the matrix it creates (before it puts in the numerical values) has the extra rows and columns, that libmesh will not put the values into but you will. Thus you get libmesh to fill up the true final matrix for its finite element problem (not realizing the matrix is a little bigger then it needs) directly, no copies of the data needed. But this is bit tricky, you'll need to combine libmesh's preallocation information with yours for the final columns and rows before you have lib mesh put the numerical values in. Double check if they have any support for this first. <br class=""><br class=""> Barry<br class=""><br class=""><br class=""><blockquote type="cite" class="">On May 29, 2019, at 6:29 PM, Manav Bhatia <<a href="mailto:bhatiamanav@gmail.com" class="">bhatiamanav@gmail.com</a>> wrote:<br class=""><br class="">Thanks, Barry. <br class=""><br class="">I am working on a FE application (involving bifurcation behavior) with libMesh where I need to solve the system of equations along with a few extra unknowns that are not directly related to the FE mesh. I am able to assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my code and libMesh provides me with the sparsity pattern for this. <br class=""><br class="">Next, the system of equations that I need to solve is: <br class=""><br class="">[ J_fe A ] { dX } = { R_fe }<br class="">[ B C ] { dV } = {R_ext } <br class=""><br class="">Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the bordered system for my path continuation solver. <br class=""><br class="">I have implemented a solver using Schur factorization ( this is outside of PETSc and does not use the FieldSplit construct ). This works well for most cases, except when J_fe is close to singular. <br class=""><br class="">I am now attempting to create a monolithic matrix that solves the complete system. <br class=""><br class="">Currently, the approach I am considering is to compute J_fe using my libMesh application, so that I don’t have to change that. I am defining a new matrix with the extra non-zero locations for A, B, C. <br class=""><br class=""> With J_fe computed, I am looking to copy its non-zero entries to this new matrix. This is where I am stumbling since I don’t know how best to get the non-zero locations in J_fe. Maybe there is a better approach to copy from J_fe to the new matrix? <br class=""><br class=""> I have looked through the nested matrix construct, but have not given this a serious consideration. Maybe I should? Note that I don’t want to solve J_fe and C separately (not as separate systems), so the field-split approach will not be suitable here. <br class=""><br class=""> Also, I am currently using MUMPS for all my parallel solves. <br class=""><br class=""> I would appreciate any advice. <br class=""><br class="">Regards,<br class="">Manav<br class=""><br class=""><br class=""><blockquote type="cite" class="">On May 29, 2019, at 6:07 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" class="">bsmith@mcs.anl.gov</a>> wrote:<br class=""><br class=""><br class="">Manav,<br class=""><br class="">For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal.<br class=""><br class=""> Barry<br class=""><br class=""><br class=""><blockquote type="cite" class="">On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>> wrote:<br class=""><br class="">Hi, <br class=""><br class=""> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: <a href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html" class="">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html</a> , but not for parallel matrices. <br class=""><br class="">Regards,<br class="">Manav<br class=""><br class=""><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></div></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div></body></html>