parallel solvers and matrix structure

Barry Smith bsmith at mcs.anl.gov
Mon Jan 26 11:30:39 CST 2009


    You'll want to see why it is running with a different number of  
iterations with the later PETSc release. As I said before
run both (old and new) with -ksp_monitor_true_residual -ksp_truemonitor

    Barry

On Jan 25, 2009, at 11:04 PM, Khan, Irfan wrote:

> Thanks for the quick and helpful suggestions. I have used  
> MatSetOptions() using MAT_IGNORE_ZERO_ENTRIES with -pc_type sor.  
> This gives me the same Stiffness matrix in both cases (with and  
> without excluding zero entries). Further the results are the same,  
> including the number of iterations.
>
> However, the number of iterations to convergence is 70. Earlier I  
> used to use PCASM with KSPCG. This required 17 iterations. Barry  
> mentioned "spooles" with pc_type cholesky. But I am skeptical about  
> its performs on systems with over a million degrees of freedom. My  
> matrices are positive definite, thus CG may be ideally suited. But I  
> am not sure what kind of preconditioner would best suit in parallel.
>
> Any suggestions on preconditioner-solver combinations for very large  
> (1-10 million dof) positive definite systems?
>
> Thanks again
> Best Regards
> Irfan
>
>
> ----- Original Message -----
> From: "Barry Smith" <bsmith at mcs.anl.gov>
> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> Sent: Sunday, January 25, 2009 9:42:42 PM GMT -05:00 US/Canada Eastern
> Subject: Re: parallel solvers and matrix structure
>
>
> On Jan 25, 2009, at 8:19 PM, Khan, Irfan wrote:
>
>> Dear Petsc team
>>
>> Firstly, thanks for developing PETSc. I have been using it to
>> parallelize a linear finite element code to use with a parallel
>> lattice Boltzmann code. It has helped me a lot untill now.
>>
>> I have some questions about the way the parallel solvers handles
>> matrices with varying zeros in the matrix.
>>
>> I have found that the more MatSetValue() commands I use in my code
>> the slower it gets. Therefore I initialize the parallel Stiffness
>> matrix to 0.0.
>
>     I don't know what you mean by this. You should NOT call
> MatZeroEntries() on the matrix initially. This will destroy your
> preallocation information.
> Call MatZeroEntries() after you have filled it up and used it, when
> you are ready to start again.
>
>> I then fill in the values using an "if" conditions to eliminate zero
>> entries into the parallel Stiffness matrix. This reduces the number
>> of times MatSetValue() is called greatly (5-15 times). I also
>> approximate the number of non-zero entries into the parallel matrix,
>> that I create using MatCreateMPIAIJ. I have attached outputs of
>> running my code with "-info"; one with the "if" condition and the
>> other without the conditions (at the end of the email).
>
>    If you are using the SeqAIJ or MPIAIJ matrices then you can simply
> call MatSetOption(mat,MAT_IGNORE_ZERO_ENTRIES)
> and it will not put in those locations with zero value.
>
>
>>
>>
>> I have also compared the matrices generated by using the "if"
>> condition and the one generated without the "if" condition. They are
>> the same except for the extra zero entries in the later case. But
>> what I realize is that during solving, the matrix generated with the
>> "if" condition is not converging while the matrix generated without
>> the "if" conditions converges in 17 iterations. I use KSPCG.
>
>   First run both cases with -pc_type lu and confirm that they both
> produce the same solution to the linear system (on one process).
>
>   Also confirm that the matrix is symmetric positive definite so you
> can use CG.
>
>    The reason for the difference in converging is because the default
> solver that PETSc uses is ILU(0). The "quality" of an ILU
> preconditioner depends on the
> "nonzero" structure of the matrix, the more "nonzero" locations the
> more likely there will be convergence and it will be faster. For ILU a
> zero in a location is different
> than not having that location at all. Likely you want to use a better
> preconditioner; what about -pc_type sor? For moderate size problems,
> say < 100,000 you may
> want to use Cholesky direct solver. In parallel run config/
> configure.py --download-spooles then run the code with -pc_type
> cholesky -pc_factor_mat_solver_package spooles
>
>
>    Barry
>
>>
>>
>> It would help me a lot if I can get some suggestions on how to use
>> MatSetValue() optimally, the reason for KSPCG failing to converge
>> and if something can be done. Also any suggestions on if I should
>> not use an "if" condition to enter values into the Stiffness matrix
>> to eliminate zero entries.
>>
>> I will be glad to send any other information if needed.
>>
>> Thanks in advance
>> Best Regards
>> Irfan
>>
>>
>>
>>
>> WITHOUT "IF" CONDITION
>> < [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 3464
>> < [0] MatAssemblyBegin_MPIAIJ(): Stash has 432 entries, uses 0
>> mallocs.
>> < [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space:
>> 306 unneeded,4356 used
>> < [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>> MatSetValues() is 216
>> < [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66
>> < [0] Mat_CheckInode(): Found 14 nodes of 66. Limit used: 5. Using
>> Inode routines
>> < [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space:
>> 465 unneeded,3576 used
>> < [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>> MatSetValues() is 183
>> < [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66
>> < [1] Mat_CheckInode(): Found 23 nodes of 66. Limit used: 5. Using
>> Inode routines
>>
>> < [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space:
>> 243 unneeded,972 used
>> < [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>> MatSetValues() is 72
>> < [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66
>> < [1] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 12)/
>> (num_localrows 66) < 0.6. Do not use CompressedRow routines.
>>
>> < [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space:
>> 504 unneeded,1116 used
>> < [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>> MatSetValues() is 99
>> < [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 66
>> < [0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 0)/
>> (num_localrows 66) < 0.6. Do not use CompressedRow routines.
>>
>>
>> WITH "IF" CONDITION
>>> [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 632
>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 78 entries, uses 0 mallocs.
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space:
>>> 118 unneeded,1304 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>> MatSetValues() is 0
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 26
>>> [0] Mat_CheckInode(): Found 66 nodes out of 66 rows. Not using
>>> Inode routines
>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 66; storage space:
>>> 353 unneeded,1033 used
>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>> MatSetValues() is 6
>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 26
>>> [1] Mat_CheckInode(): Found 64 nodes out of 66 rows. Not using
>>> Inode routines
>>
>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 24; storage space:
>>> 14 unneeded,121 used
>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>> MatSetValues() is 0
>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 12
>>> [1] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 48)/
>>> (num_localrows 66) > 0.6. Use CompressedRow routines.
>>
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 66 X 18; storage space:
>>> 14 unneeded,121 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>> MatSetValues() is 0
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 11
>>> [0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows 42)/
>>> (num_localrows 66) > 0.6. Use CompressedRow routines.
>>
>>
>>
>>
>>
>



More information about the petsc-users mailing list