[petsc-users] Error reported by MUMPS in numerical factorization phase
Hong
hzhang at mcs.anl.gov
Wed Dec 2 12:26:56 CST 2015
Danyang:
It is likely a zero pivot. I'm adding a feature to petsc. When matrix
factorization fails, computation continues with error information stored in
ksp->reason=DIVERGED_PCSETUP_FAILED.
For your timestepping code, you may able to automatically reduce timestep
and continue your simulation.
Do you want to test it? If so, you need install petsc-dev with my
branch hzhang/matpackage-erroriffpe on your cluster. We may merge this
branch to petsc-master soon.
>
> It's not easy to run in debugging mode as the cluster does not have petsc
> installed using debug mode. Restart the case from the crashing time does
> not has the problem. So if I want to detect this error, I need to start the
> simulation from beginning which takes hours in the cluster.
>
This is why we are adding this new feature.
>
> Do you mean I need to redo symbolic factorization? For now, I only do
> factorization once at the first timestep and then reuse it. Some of the
> code is shown below.
>
> if (timestep == 1) then
> call PCFactorSetMatSolverPackage(pc_flow,MATSOLVERMUMPS,ierr)
> CHKERRQ(ierr)
>
> call PCFactorSetUpMatSolverPackage(pc_flow,ierr)
> CHKERRQ(ierr)
>
> call PCFactorGetMatrix(pc_flow,a_flow_j,ierr)
> CHKERRQ(ierr)
> end if
>
> call KSPSolve(ksp_flow,b_flow,x_flow,ierr)
> CHKERRQ(ierr)
>
I do not think you need to change this part of code.
Does you code check convergence at each time step?
Hong
>
>
> On 15-12-02 08:39 AM, Hong wrote:
>
> Danyang :
>>
>> My code fails due to the error in external library. It works fine for the
>> previous 2000+ timesteps but then crashes.
>>
>> [4]PETSC ERROR: Error in external library
>> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
>> INFO(1)=-1, INFO(2)=0
>>
>
> This simply says an error occurred in proc[0] during numerical
> factorization, which usually either encounter a zeropivot or run out of
> memory. Since it is at a later timesteps, which I guess you reuse matrix
> factor, zeropivot might be the problem.
> Is possible to run it in debugging mode? In this way, mumps would dump out
> more information.
>
>>
>> Then I tried the same simulation on another machine using the same number
>> of processors, it does not fail.
>>
> Does this machine have larger memory?
>
> Hong
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151202/5fb315b0/attachment.html>
More information about the petsc-users
mailing list