[petsc-users] Error reported by MUMPS in numerical factorization phase

Danyang Su danyang.su at gmail.com
Wed Dec 2 13:36:07 CST 2015


Hi Hong,

Thank. I can test it but it may takes some time to install petsc-dev on 
the cluster. I will try more cases to see if I can get this error on my 
local machine which is much more convenient for me to test in debug 
mode. So far, the error does not occur on my local machine using the 
same code, the same petsc-3.6.2 version, the same case and the same 
number of processors. The system and petsc configuration is different.

Regards,

Danyang

On 15-12-02 10:26 AM, Hong wrote:
> Danyang:
> It is likely a zero pivot. I'm adding a feature to petsc. When matrix 
> factorization fails, computation continues with error information 
> stored in
> ksp->reason=DIVERGED_PCSETUP_FAILED.
> For your timestepping code, you may able to automatically reduce 
> timestep and continue your simulation.
>
> Do you want to test it? If so, you need install petsc-dev with my 
> branch hzhang/matpackage-erroriffpe on your cluster. We may merge this 
> branch to petsc-master soon.
>
>
>     It's not easy to run in debugging mode as the cluster does not
>     have petsc installed using debug mode. Restart the case from the
>     crashing time does not has the problem. So if I want to detect
>     this error, I need to start the simulation from beginning which
>     takes hours in the cluster.
>
>
> This is why we are adding this new feature.
>
>
>     Do you mean I need to redo symbolic factorization? For now, I only
>     do factorization once at the first timestep and then reuse it.
>     Some of the code is shown below.
>
>                 if (timestep == 1) then
>                   call
>     PCFactorSetMatSolverPackage(pc_flow,MATSOLVERMUMPS,ierr)
>                   CHKERRQ(ierr)
>
>                   call PCFactorSetUpMatSolverPackage(pc_flow,ierr)
>                   CHKERRQ(ierr)
>
>                   call PCFactorGetMatrix(pc_flow,a_flow_j,ierr)
>                   CHKERRQ(ierr)
>                 end if
>
>                 call KSPSolve(ksp_flow,b_flow,x_flow,ierr)
>                 CHKERRQ(ierr)
>
>
> I do not think you need to change this part of code.
> Does you code check convergence at each time step?
>
> Hong
>
>
>
>     On 15-12-02 08:39 AM, Hong wrote:
>>     Danyang :
>>
>>         My code fails due to the error in external library. It works
>>         fine for the previous 2000+ timesteps but then crashes.
>>
>>         [4]PETSC ERROR: Error in external library
>>         [4]PETSC ERROR: Error reported by MUMPS in numerical
>>         factorization phase: INFO(1)=-1, INFO(2)=0
>>
>>     This simply says an error occurred in proc[0] during numerical
>>     factorization, which usually either encounter a zeropivot or run
>>     out of memory. Since it is at a later timesteps, which I guess
>>     you reuse matrix factor, zeropivot might be the problem.
>>     Is possible to run it in debugging mode? In this way, mumps would
>>     dump out more information.
>>
>>
>>         Then I tried the same simulation on another machine using the
>>         same number of processors, it does not fail.
>>
>>     Does this machine  have larger memory?
>>
>>     Hong
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151202/c9b3e362/attachment.html>


More information about the petsc-users mailing list