[petsc-users] Receiving DIVERGED_PCSETUP_FAILED

Barry Smith bsmith at mcs.anl.gov
Thu Jun 9 17:56:43 CDT 2016


> On Jun 9, 2016, at 3:32 PM, Faraz Hussain <faraz_hussain at yahoo.com> wrote:
> 
> I have been following ex52.c ksp/ksp/examples/tutorials to use MUMPS to directly solve Ax=b. My matrix is symmetric and positive definite. I built a small cantilever beam model with matrix of 5000^2 size. It solves in 2 seconds and gives correct answer. But when I use a finer mesh of the cantilever beam with 3.3 million^2 matrix, I get the following error:
> 
>  Mumps row pivot threshhold = 1e-06

   Maybe you can change this to get MUMPS to pivot less aggressively. Doing lots of pivoting will require a lot more memory. In theory since it is SPD it should not need to pivot at all.

>  Mumps determinant = (0., 0.) * 2^0
> Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0
>               PCSETUP_FAILED due to FACTOR_OUTMEMORY
> Norm of error inf. iterations 0
> 
> It runs for more than an hour before aborting with this message. I am running it with this command:
> 
> mpiexec -hostfile ./hostfile -np 48 ./ex12 -ksp_converged_reason
> 
> My machines have 24 cpus and 125GB Ram. When I do "top" I see it correctly spans 48 processes on 2 nodes. The memory usage of each process is no more than 1-2GB. So I do not understand why it gives FACTOR_OUTMEMORY ?
> 
> The same matrix solves in under 5 minutes in Intel Pardiso using 24 cpus on one host. 

   Mumps may be (likely?) is using a different matrix ordering then Intel Pardiso. Unfortunately each of these packages have a different way of asking for orderings and different orderings to chose from so you will need to look at the details for each package.

> I thought maybe mumps thinks it is ill-conditioned? The model does converge in the iterative solver in 4000 iterations. I also tried running with these options per the FAQ on 
> 
> " How can I determine the condition number of a matrix? ".
> 
> mpiexec -hostfile ./hostfile -np 48 ./ex12 -pc_type none -ksp_type gmres -ksp_monitor_singular_value -ksp_gmres_restart 1000 -ksp_converged_reason -ksp_monitor_true_residual
> 
> After 1337 iterations I cancelled it, and the output was:

  You need to look at the condition number just before GMRES reaches the restart. It has to start all over again at the restart. So what was the estimated condition number at 999 iterations?

   It could be that Intel Pardiso produced a low quality solution if the matrix is ill conditioned. You can run with -ksp_type gmres -ksp_max_it 5 -ksp_monitor_true_residual with -pc_type lu to see how small the residuals are after the "direct" solver.

  Barry

> 
> 1337 KSP preconditioned resid norm 5.647402411074e-02 true resid norm 5.647402411074e-02 ||r(i)||/||b|| 3.993316540960e-03
> 1337 KSP Residual norm 5.647402411074e-02 % max 1.070324243277e+05 min 1.220336631740e-01 max/min 8.770729448238e+05



More information about the petsc-users mailing list