[petsc-users] Petsc mesh scalability issue with iterative solver and direct solver

Fri Jul 29 11:46:54 CDT 2016

Dear PETSC developers,

Thank you for developing such a powerful tool for scientific computations.

I'm currently trying to run a simple cantilever beam FEM to test the
scalability of PETSC on multi-processors. I also want to verify whether
iterative solver or direct solver is more efficient for parallel large FEM
problem.

Problem description, An Euler elementary cantilever beam with point load at
the end along -y direction. Each node has 2 DOF (deflection and rotation)).
MPIBAIJ is used with bs = 2, dnnz and onnz are determined based on the
connectivity. Loop with elements in each processor to assemble the global
matrix with same element stiffness matrix. The boundary condition is set
using call
MatZeroRowsColumns(SG,2,g_BC,one,PETSC_NULL_OBJECT,PETSC_NULL_OBJECT,ierr);

Based on what I have done, I find the computations work well, i.e the
results are correct compared with theoretical solution, for small mesh size
(small than 5000 elements) using both solvers with different numbers of
processes.

However, there are several confusing issues when I increase the mesh size
to 10000 and more elements with iterative solve(CG + PCBJACOBI)

1. For 10k elements, I can get accurate solution using iterative solver
with uni-processor(i.e. only one process). However, when I use 2-8
processes, it tells the linear solver converged with different iterations,
but, the results are all different for different processes and erroneous.
The wired thing is when I use >9 processes, the results are correct again.
I am really confused by this. Could you explain me why?  If my
parallelization is not correct, why it works for small cases? And I check
the global matrix and RHS vector and didn't see any mallocs during the
process.

2. For 30k elements, if I use one process, it says: Linear solve did not
converge due to DIVERGED_INDEFINITE_PC. Does this commonly happen for large
sparse matrix? If so, is there any stable solver or pc for large problem?

For parallel computing using direct solver(SUPERLU_DIST + PCLU), I can only
get accuracy when the number of elements are below 5000. There must be
something wrong. The way I use the superlu_dist solver is first convert
MatType to AIJ, then call PCFactorSetMatSolverPackage, and change the PC to
PCLU. Do I miss anything else to run SUPER_LU correctly?

I also use SUPER_LU and iterative solver(CG+PCBJACOBI) to solve the
sequential version of the same problem. The results shows that iterative
solver works well for <50k elements, while SUPER_LU only gets right
solution below 5k elements. Can I say iterative solver is better than
SUPER_LU for large problem? How can I improve the solver to copy with very
large problem, such as million by million? Another thing is it's still
doubtable of performance of SUPER_LU.

For the inaccuracy issue, do you think it may be due to the memory?
However, there is no memory error showing during the execution.

I really appreciate someone could resolve those puzzles above for me. My
goal is to replace the current SUPER_LU  solver in my parallel CPFEM main
program with the iterative solver using PETSC.

Please let me if you would like to see my code in detail.

Thank you very much.

Bests,
Jinlei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160729/0be3b0f4/attachment-0001.html>