[petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9

Zongze Yang yangzongze at gmail.com
Sat Mar 4 09:45:03 CST 2023


Thanks, I will give it a try.

Best wishes,
Zongze


On Sat, 4 Mar 2023 at 23:09, Pierre Jolivet <pierre at joliv.et> wrote:

>
>
> On 4 Mar 2023, at 3:26 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>
> 
>
>
> On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pierre at joliv.et> wrote:
>
>>
>>
>> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>>
>>
>>
>> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et> wrote:
>>
>>>
>>>
>>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am writing to seek your advice regarding a problem I encountered
>>> while using multigrid to solve a certain issue.
>>> > I am currently using multigrid with the coarse problem solved by PCLU.
>>> However, the PC failed randomly with the error below (the value of INFO(2)
>>> may differ):
>>> > ```shell
>>> > [ 0] Error reported by MUMPS in numerical factorization phase:
>>> INFOG(1)=-9, INFO(2)=36
>>> > ```
>>> >
>>> > Upon checking the documentation of MUMPS, I discovered that increasing
>>> the value of ICNTL(14) may help resolve the issue. Specifically, I set the
>>> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
>>> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
>>> still curious as to why MUMPS failed randomly in the first place.
>>> >
>>> > Upon further inspection, I found that the number of nonzeros of the
>>> PETSc matrix and the MUMPS matrix were different every time I ran the code.
>>> I am now left with the following questions:
>>> >
>>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to
>>> change every time I run the code?
>>>
>>> Is the Mat being fed to MUMPS distributed on a communicator of size
>>> greater than one?
>>> If yes, then, depending on the pivoting and the renumbering, you may get
>>> non-deterministic results.
>>>
>>
>> Hi, Pierre,
>> Thank you for your prompt reply. Yes, the size of the communicator is
>> greater than one.
>> Even if the size of the communicator is equal, are the results
>> still non-deterministic?
>>
>>
>> In the most general case, yes.
>>
>> Can I assume the Mat being fed to MUMPS is the same in this case?
>>
>>
>> Are you doing algebraic or geometric multigrid?
>> Are the prolongation operators computed by Firedrake or by PETSc, e.g.,
>> through GAMG?
>> If it’s the latter, I believe the Mat being fed to MUMPS should always be
>> the same.
>> If it’s the former, you’ll have to ask the Firedrake people if there may
>> be non-determinism in the coarsening process.
>>
>
> I am using geometric multigrid, and the prolongation operators, I think,
> are computed by Firedrake.
> Thanks for your suggestion, I will ask the Firedrake people.
>
>
>>
>> Is the pivoting and renumbering all done by MUMPS other than PETSc?
>>
>>
>> You could provide your own numbering, but by default, this is outsourced
>> to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.
>>
>
> I think I won't do this.
> By the way, does the result of superlu_dist  have a similar
> non-deterministic?
>
>
> SuperLU_DIST uses static pivoting as far as I know, so it may be more
> deterministic.
>
> Thanks,
> Pierre
>
> Thanks,
> Zongze
>
>
>> Thanks,
>> Pierre
>>
>>
>>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly
>>> greater than that of the PETSc matrix (as seen in the output of ksp_view,
>>> 115025949 vs 7346177)?
>>>
>>> Exact factorizations introduce fill-in.
>>> The number of nonzeros you are seeing for MUMPS is the number of
>>> nonzeros in the factors.
>>>
>>> > 3. Is it possible that the varying number of nonzeros of the MUMPS
>>> matrix is the cause of the random failure?
>>>
>>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical
>>> pivoting, and which may generate factors with different number of nonzeros.
>>>
>>
>> Got it. Thank you for your clear explanation.
>> Zongze
>>
>>
>>> Thanks,
>>> Pierre
>>
>>
>>> > I have attached a test example written in Firedrake. The output of
>>> `ksp_view` after running the code twice is included below for your
>>> reference.
>>> > In the output, the number of nonzeros of the MUMPS matrix was
>>> 115025949 and 115377847, respectively, while that of the PETSc matrix was
>>> only 7346177.
>>> >
>>> > ```shell
>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>>> ::ascii_info_detail | grep -A3 "type: "
>>> >   type: preonly
>>> >   maximum iterations=10000, initial guess is zero
>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>> >   left preconditioning
>>> > --
>>> >   type: lu
>>> >     out-of-place factorization
>>> >     tolerance for zero pivot 2.22045e-14
>>> >     matrix ordering: external
>>> > --
>>> >           type: mumps
>>> >           rows=1050625, cols=1050625
>>> >           package used to perform factorization: mumps
>>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>>> > --
>>> >     type: mpiaij
>>> >     rows=1050625, cols=1050625
>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>> >     total number of mallocs used during MatSetValues calls=0
>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>>> ::ascii_info_detail | grep -A3 "type: "
>>> >   type: preonly
>>> >   maximum iterations=10000, initial guess is zero
>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>> >   left preconditioning
>>> > --
>>> >   type: lu
>>> >     out-of-place factorization
>>> >     tolerance for zero pivot 2.22045e-14
>>> >     matrix ordering: external
>>> > --
>>> >           type: mumps
>>> >           rows=1050625, cols=1050625
>>> >           package used to perform factorization: mumps
>>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>>> > --
>>> >     type: mpiaij
>>> >     rows=1050625, cols=1050625
>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>> >     total number of mallocs used during MatSetValues calls=0
>>> > ```
>>> >
>>> > I would greatly appreciate any insights you may have on this matter.
>>> Thank you in advance for your time and assistance.
>>> >
>>> > Best wishes,
>>> > Zongze
>>> > <test_mumps.py>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/9b700a1d/attachment.html>


More information about the petsc-users mailing list