[petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
Zongze Yang
yangzongze at gmail.com
Sat Mar 4 07:51:09 CST 2023
On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et> wrote:
>
>
> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
> >
> > Hi,
> >
> > I am writing to seek your advice regarding a problem I encountered while
> using multigrid to solve a certain issue.
> > I am currently using multigrid with the coarse problem solved by PCLU.
> However, the PC failed randomly with the error below (the value of INFO(2)
> may differ):
> > ```shell
> > [ 0] Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=36
> > ```
> >
> > Upon checking the documentation of MUMPS, I discovered that increasing
> the value of ICNTL(14) may help resolve the issue. Specifically, I set the
> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
> still curious as to why MUMPS failed randomly in the first place.
> >
> > Upon further inspection, I found that the number of nonzeros of the
> PETSc matrix and the MUMPS matrix were different every time I ran the code.
> I am now left with the following questions:
> >
> > 1. What could be causing the number of nonzeros of the MUMPS matrix to
> change every time I run the code?
>
> Is the Mat being fed to MUMPS distributed on a communicator of size
> greater than one?
> If yes, then, depending on the pivoting and the renumbering, you may get
> non-deterministic results.
>
Hi, Pierre,
Thank you for your prompt reply. Yes, the size of the communicator is
greater than one.
Even if the size of the communicator is equal, are the results
still non-deterministic? Can I assume the Mat being fed to MUMPS is the
same in this case?
Is the pivoting and renumbering all done by MUMPS other than PETSc?
> > 2. Why is the number of nonzeros of the MUMPS matrix significantly
> greater than that of the PETSc matrix (as seen in the output of ksp_view,
> 115025949 vs 7346177)?
>
> Exact factorizations introduce fill-in.
> The number of nonzeros you are seeing for MUMPS is the number of nonzeros
> in the factors.
>
> > 3. Is it possible that the varying number of nonzeros of the MUMPS
> matrix is the cause of the random failure?
>
> Yes, MUMPS uses dynamic scheduling, which will depend on numerical
> pivoting, and which may generate factors with different number of nonzeros.
>
Got it. Thank you for your clear explanation.
Zongze
> Thanks,
> Pierre
> > I have attached a test example written in Firedrake. The output of
> `ksp_view` after running the code twice is included below for your
> reference.
> > In the output, the number of nonzeros of the MUMPS matrix was 115025949
> and 115377847, respectively, while that of the PETSc matrix was only
> 7346177.
> >
> > ```shell
> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
> ::ascii_info_detail | grep -A3 "type: "
> > type: preonly
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > --
> > type: lu
> > out-of-place factorization
> > tolerance for zero pivot 2.22045e-14
> > matrix ordering: external
> > --
> > type: mumps
> > rows=1050625, cols=1050625
> > package used to perform factorization: mumps
> > total: nonzeros=115025949, allocated nonzeros=115025949
> > --
> > type: mpiaij
> > rows=1050625, cols=1050625
> > total: nonzeros=7346177, allocated nonzeros=7346177
> > total number of mallocs used during MatSetValues calls=0
> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
> ::ascii_info_detail | grep -A3 "type: "
> > type: preonly
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > --
> > type: lu
> > out-of-place factorization
> > tolerance for zero pivot 2.22045e-14
> > matrix ordering: external
> > --
> > type: mumps
> > rows=1050625, cols=1050625
> > package used to perform factorization: mumps
> > total: nonzeros=115377847, allocated nonzeros=115377847
> > --
> > type: mpiaij
> > rows=1050625, cols=1050625
> > total: nonzeros=7346177, allocated nonzeros=7346177
> > total number of mallocs used during MatSetValues calls=0
> > ```
> >
> > I would greatly appreciate any insights you may have on this matter.
> Thank you in advance for your time and assistance.
> >
> > Best wishes,
> > Zongze
> > <test_mumps.py>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/42166db4/attachment-0001.html>
More information about the petsc-users
mailing list