[petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9

Pierre Jolivet pierre at joliv.et
Sat Mar 4 08:03:10 CST 2023



> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzongze at gmail.com> wrote:
> 
> 
> 
> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>> 
>> 
>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com <mailto:yangzongze at gmail.com>> wrote:
>> > 
>> > Hi, 
>> > 
>> > I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue.
>> > I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ):
>> > ```shell
>> > [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36
>> > ```
>> > 
>> > Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place.
>> > 
>> > Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions:
>> > 
>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code?
>> 
>> Is the Mat being fed to MUMPS distributed on a communicator of size greater than one?
>> If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results.
>  
> Hi, Pierre,
> Thank you for your prompt reply. Yes, the size of the communicator is greater than one. 
> Even if the size of the communicator is equal, are the results still non-deterministic?

In the most general case, yes.

> Can I assume the Mat being fed to MUMPS is the same in this case?

Are you doing algebraic or geometric multigrid?
Are the prolongation operators computed by Firedrake or by PETSc, e.g., through GAMG?
If it’s the latter, I believe the Mat being fed to MUMPS should always be the same.
If it’s the former, you’ll have to ask the Firedrake people if there may be non-determinism in the coarsening process.

> Is the pivoting and renumbering all done by MUMPS other than PETSc?

You could provide your own numbering, but by default, this is outsourced to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.

Thanks,
Pierre

>> 
>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)?
>> 
>> Exact factorizations introduce fill-in.
>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors.
>> 
>> > 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure?
>> 
>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros.
> 
> Got it. Thank you for your clear explanation.
> Zongze 
> 
>> 
>> Thanks,
>> Pierre
>> 
>> > I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference.
>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177.
>> > 
>> > ```shell
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > ```
>> > 
>> > I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance.
>> > 
>> > Best wishes,
>> > Zongze
>> > <test_mumps.py>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/ddd8fe4e/attachment.html>


More information about the petsc-users mailing list