[petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
Pierre Jolivet
pierre at joliv.et
Sat Mar 4 07:37:15 CST 2023
> On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>
> Hi,
>
> I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue.
> I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ):
> ```shell
> [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36
> ```
>
> Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place.
>
> Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions:
>
> 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code?
Is the Mat being fed to MUMPS distributed on a communicator of size greater than one?
If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results.
> 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)?
Exact factorizations introduce fill-in.
The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors.
> 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure?
Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros.
Thanks,
Pierre
> I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference.
> In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177.
>
> ```shell
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> --
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: external
> --
> type: mumps
> rows=1050625, cols=1050625
> package used to perform factorization: mumps
> total: nonzeros=115025949, allocated nonzeros=115025949
> --
> type: mpiaij
> rows=1050625, cols=1050625
> total: nonzeros=7346177, allocated nonzeros=7346177
> total number of mallocs used during MatSetValues calls=0
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> --
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: external
> --
> type: mumps
> rows=1050625, cols=1050625
> package used to perform factorization: mumps
> total: nonzeros=115377847, allocated nonzeros=115377847
> --
> type: mpiaij
> rows=1050625, cols=1050625
> total: nonzeros=7346177, allocated nonzeros=7346177
> total number of mallocs used during MatSetValues calls=0
> ```
>
> I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance.
>
> Best wishes,
> Zongze
> <test_mumps.py>
More information about the petsc-users
mailing list