[petsc-users] Receiving DIVERGED_PCSETUP_FAILED

Faraz Hussain faraz_hussain at yahoo.com
Fri Jun 10 17:27:20 CDT 2016


I think the issue is I need to play more with the "parrallel" settings here. 

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERMUMPS.html
The example 52.c was based on sequential running. So doing mpiexec -np 48, was basically just using one processor. I also never installed parmetis, metis or ptscotch. 

Will install and adjust the MUMPS settings and hopefully will get it to converge this weekend!

      From: Barry Smith <bsmith at mcs.anl.gov>
 To: Faraz Hussain <faraz_hussain at yahoo.com> 
Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
 Sent: Friday, June 10, 2016 4:02 PM
 Subject: Re: [petsc-users] Receiving DIVERGED_PCSETUP_FAILED
   

> On Jun 10, 2016, at 3:56 PM, Faraz Hussain <faraz_hussain at yahoo.com> wrote:
> 
> Thanks for the suggestions. I checked but was not able to find out how to change the mumps row pivot threshold of 1e-06. Maybe I will ask on the mumps user forum.

This might help. 

  ierr = PetscOptionsReal("-mat_mumps_cntl_1","CNTL(1): relative pivoting threshold","None",mumps->id.CNTL(1),&mumps->id.CNTL(1),NULL);CHKERRQ(ierr);
  ierr = PetscOptionsReal("-mat_mumps_cntl_2","CNTL(2): stopping criterion of refinement","None",mumps->id.CNTL(2),&mumps->id.CNTL(2),NULL);CHKERRQ(ierr);
  ierr = PetscOptionsReal("-mat_mumps_cntl_3","CNTL(3): absolute pivoting threshold","None",mumps->id.CNTL(3),&mumps->id.CNTL(3),NULL);CHKERRQ(ierr);
  ierr = PetscOptionsReal("-mat_mumps_cntl_4","CNTL(4): value for static pivoting","None",mumps->id.CNTL(4),&mumps->id.CNTL(4),NULL);CHKERRQ(ierr);

Note I don't know what they mean so you need to read the mumps docs.

> Regarding:
> 
>  > You need to look at the condition number just before GMRES reaches the restart. It has to start all over again at the restart. So what was the estimated condition number at 999 iterations?
> 
> I ran again and the condition number at 999 iterations is: 
> 
> 999 KSP preconditioned resid norm 5.921717188418e-02 true resid norm 5.921717188531e-02 ||r(i)||/||b|| 4.187286380279e-03 
> 999 KSP Residual norm 5.921717188418e-02 % max 1.070338898624e+05 min 1.002755075294e-01 max/min 1.067398136390e+06

Ok, so relatively ill-conditioned matrix. But seemingly not terrible.

  Barry

> 
> 
> From: Barry Smith <bsmith at mcs.anl.gov>
> To: Faraz Hussain <faraz_hussain at yahoo.com> 
> Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> Sent: Thursday, June 9, 2016 5:56 PM
> Subject: Re: [petsc-users] Receiving DIVERGED_PCSETUP_FAILED
> 
> 
> > On Jun 9, 2016, at 3:32 PM, Faraz Hussain <faraz_hussain at yahoo.com> wrote:
> > 
> > I have been following ex52.c ksp/ksp/examples/tutorials to use MUMPS to directly solve Ax=b. My matrix is symmetric and positive definite. I built a small cantilever beam model with matrix of 5000^2 size. It solves in 2 seconds and gives correct answer. But when I use a finer mesh of the cantilever beam with 3.3 million^2 matrix, I get the following error:
> > 
> >  Mumps row pivot threshhold = 1e-06
> 
>  Maybe you can change this to get MUMPS to pivot less aggressively. Doing lots of pivoting will require a lot more memory. In theory since it is SPD it should not need to pivot at all.
> 
> >  Mumps determinant = (0., 0.) * 2^0
> > Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0
> >              PCSETUP_FAILED due to FACTOR_OUTMEMORY
> > Norm of error inf. iterations 0
> > 
> > It runs for more than an hour before aborting with this message. I am running it with this command:
> > 
> > mpiexec -hostfile ./hostfile -np 48 ./ex12 -ksp_converged_reason
> > 
> > My machines have 24 cpus and 125GB Ram. When I do "top" I see it correctly spans 48 processes on 2 nodes. The memory usage of each process is no more than 1-2GB. So I do not understand why it gives FACTOR_OUTMEMORY ?
> > 
> > The same matrix solves in under 5 minutes in Intel Pardiso using 24 cpus on one host. 
> 
>  Mumps may be (likely?) is using a different matrix ordering then Intel Pardiso. Unfortunately each of these packages have a different way of asking for orderings and different orderings to chose from so you will need to look at the details for each package.
> 
> > I thought maybe mumps thinks it is ill-conditioned? The model does converge in the iterative solver in 4000 iterations. I also tried running with these options per the FAQ on 
> > 
> > " How can I determine the condition number of a matrix? ".
> > 
> > mpiexec -hostfile ./hostfile -np 48 ./ex12 -pc_type none -ksp_type gmres -ksp_monitor_singular_value -ksp_gmres_restart 1000 -ksp_converged_reason -ksp_monitor_true_residual
> > 
> > After 1337 iterations I cancelled it, and the output was:
> 
>  You need to look at the condition number just before GMRES reaches the restart. It has to start all over again at the restart. So what was the estimated condition number at 999 iterations?
> 
>  It could be that Intel Pardiso produced a low quality solution if the matrix is ill conditioned. You can run with -ksp_type gmres -ksp_max_it 5 -ksp_monitor_true_residual with -pc_type lu to see how small the residuals are after the "direct" solver.
> 
>  Barry
> 
> 
> > 
> > 1337 KSP preconditioned resid norm 5.647402411074e-02 true resid norm 5.647402411074e-02 ||r(i)||/||b|| 3.993316540960e-03
> > 1337 KSP Residual norm 5.647402411074e-02 % max 1.070324243277e+05 min 1.220336631740e-01 max/min 8.770729448238e+05
> 
> 


  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160610/607bb333/attachment.html>


More information about the petsc-users mailing list