<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Are you doing substructuring yourself or just using petsc parallel solver ?<br><br>--- On <b>Mon, 1/26/09, Khan, Irfan <i><irfan.khan@gatech.edu></i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;">From: Khan, Irfan <irfan.khan@gatech.edu><br>Subject: Re: parallel solvers and matrix structure<br>To: "PETSc users list" <petsc-users@mcs.anl.gov><br>Date: Monday, January 26, 2009, 12:04 AM<br><br><pre>Thanks for the quick and helpful suggestions. I have used MatSetOptions() using<br>MAT_IGNORE_ZERO_ENTRIES with -pc_type sor. This gives me the same Stiffness<br>matrix in both cases (with and without excluding zero entries). Further the<br>results are the same, including the number of iterations.<br><br>However, the number of iterations to convergence is 70. Earlier I used to
use<br>PCASM with KSPCG. This required 17 iterations. Barry mentioned<br>"spooles" with pc_type cholesky. But I am skeptical about its performs<br>on systems with over a million degrees of freedom. My matrices are positive<br>definite, thus CG may be ideally suited. But I am not sure what kind of<br>preconditioner would best suit in parallel. <br><br>Any suggestions on preconditioner-solver combinations for very large (1-10<br>million dof) positive definite systems?<br><br>Thanks again<br>Best Regards<br>Irfan<br><br><br>----- Original Message -----<br>From: "Barry Smith" <bsmith@mcs.anl.gov><br>To: "PETSc users list" <petsc-users@mcs.anl.gov><br>Sent: Sunday, January 25, 2009 9:42:42 PM GMT -05:00 US/Canada Eastern<br>Subject: Re: parallel solvers and matrix structure<br><br><br>On Jan 25, 2009, at 8:19 PM, Khan, Irfan wrote:<br><br>> Dear Petsc team<br>><br>> Firstly, thanks for developing PETSc. I have been using it to <br>>
parallelize a linear finite element code to use with a parallel <br>> lattice Boltzmann code. It has helped me a lot untill now.<br>><br>> I have some questions about the way the parallel solvers handles <br>> matrices with varying zeros in the matrix.<br>><br>> I have found that the more MatSetValue() commands I use in my code <br>> the slower it gets. Therefore I initialize the parallel Stiffness <br>> matrix to 0.0.<br><br> I don't know what you mean by this. You should NOT call <br>MatZeroEntries() on the matrix initially. This will destroy your <br>preallocation information.<br>Call MatZeroEntries() after you have filled it up and used it, when <br>you are ready to start again.<br><br>> I then fill in the values using an "if" conditions to eliminate<br>zero <br>> entries into the parallel Stiffness matrix. This reduces the number <br>> of times MatSetValue() is called greatly (5-15 times). I also
<br>> approximate the number of non-zero entries into the parallel matrix, <br>> that I create using MatCreateMPIAIJ. I have attached outputs of <br>> running my code with "-info"; one with the "if"<br>condition and the <br>> other without the conditions (at the end of the email).<br><br> If you are using the SeqAIJ or MPIAIJ matrices then you can simply <br>call MatSetOption(mat,MAT_IGNORE_ZERO_ENTRIES)<br>and it will not put in those locations with zero value.<br><br><br>><br>><br>> I have also compared the matrices generated by using the "if" <br>> condition and the one generated without the "if" condition. They<br>are <br>> the same except for the extra zero entries in the later case. But <br>> what I realize is that during solving, the matrix generated with the <br>> "if" condition is not converging while the matrix generated<br>without <br>> the "if" conditions converges in 17 iterations. I use
KSPCG.<br><br> First run both cases with -pc_type lu and confirm that they both <br>produce the same solution to the linear system (on one process).<br><br> Also confirm that the matrix is symmetric positive definite so you <br>can use CG.<br><br> The reason for the difference in converging is because the default <br>solver that PETSc uses is ILU(0). The "quality" of an ILU <br>preconditioner depends on the<br>"nonzero" structure of the matrix, the more "nonzero"<br>locations the <br>more likely there will be convergence and it will be faster. For ILU a <br>zero in a location is different<br>than not having that location at all. Likely you want to use a better <br>preconditioner; what about -pc_type sor? For moderate size problems, <br>say < 100,000 you may<br>want to use Cholesky direct solver. In parallel run config/ <br>configure.py --download-spooles then run the code with -pc_type <br>cholesky -pc_factor_mat_solver_package
spooles<br><br><br> Barry<br><br>><br>><br>> It would help me a lot if I can get some suggestions on how to use <br>> MatSetValue() optimally, the reason for KSPCG failing to converge <br>> and if something can be done. Also any suggestions on if I should <br>> not use an "if" condition to enter values into the Stiffness<br>matrix <br>> to eliminate zero entries.<br>><br>> I will be glad to send any other information if needed.<br>><br>> Thanks in advance<br>> Best Regards<br>> Irfan<br>><br>><br>><br>><br>> WITHOUT "IF" CONDITION<br>> < [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 3464<br>> < [0] MatAssemblyBegin_MPIAIJ(): Stash has 432 entries, uses 0 <br>> mallocs.<br>> < [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space: <br>> 306 unneeded,4356 used<br>> < [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>> MatSetValues()
is 216<br>> < [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66<br>> < [0] Mat_CheckInode(): Found 14 nodes of 66. Limit used: 5. Using <br>> Inode routines<br>> < [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space: <br>> 465 unneeded,3576 used<br>> < [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>> MatSetValues() is 183<br>> < [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66<br>> < [1] Mat_CheckInode(): Found 23 nodes of 66. Limit used: 5. Using <br>> Inode routines<br>><br>> < [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space: <br>> 243 unneeded,972 used<br>> < [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>> MatSetValues() is 72<br>> < [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66<br>> < [1] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 12)/ <br>>
(num_localrows 66) < 0.6. Do not use CompressedRow routines.<br>><br>> < [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space: <br>> 504 unneeded,1116 used<br>> < [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>> MatSetValues() is 99<br>> < [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66<br>> < [0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 0)/ <br>> (num_localrows 66) < 0.6. Do not use CompressedRow routines.<br>><br>><br>> WITH "IF" CONDITION<br>>> [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 632<br>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 78 entries, uses 0 mallocs.<br>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space: <br>>> 118 unneeded,1304 used<br>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>>> MatSetValues() is 0<br>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum
nonzeros in any row is 26<br>>> [0] Mat_CheckInode(): Found 66 nodes out of 66 rows. Not using <br>>> Inode routines<br>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space: <br>>> 353 unneeded,1033 used<br>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>>> MatSetValues() is 6<br>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 26<br>>> [1] Mat_CheckInode(): Found 64 nodes out of 66 rows. Not using <br>>> Inode routines<br>><br>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 24; storage space: <br>>> 14 unneeded,121 used<br>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>>> MatSetValues() is 0<br>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 12<br>>> [1] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 48)/ <br>>> (num_localrows 66) > 0.6. Use CompressedRow
routines.<br>><br>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 18; storage space: <br>>> 14 unneeded,121 used<br>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during <br>>> MatSetValues() is 0<br>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 11<br>>> [0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 42)/ <br>>> (num_localrows 66) > 0.6. Use CompressedRow routines.<br>><br>><br>><br>><br>><br><br></pre></blockquote></td></tr></table><br>