[petsc-users] Automatically re-solving after MUMPS error
Barry Smith
bsmith at mcs.anl.gov
Wed Sep 30 17:28:07 CDT 2015
Matt,
Please try the following: edit
#undef __FUNCT__
#define __FUNCT__ "MatDestroy_MUMPS"
PetscErrorCode MatDestroy_MUMPS(Mat A)
{
Mat_MUMPS *mumps=(Mat_MUMPS*)A->spptr;
PetscErrorCode ierr;
PetscFunctionBegin;
if (mumps->CleanUpMUMPS) {
Remove this if () test and just always do the lines of clean up code after it. Let us know if this resolves the problem?
Thanks
Barry
This CleanUpMUMPS flag has always be goofy and definitely needs to be removed, the only question is if some other changes are needed when it is removed.
> On Sep 30, 2015, at 4:59 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> Matt,
>
> Yes, you must be right The MatDestroy() on the "partially factored" matrix should clean up everything properly but it sounds like it is not. I'll look at it right now but I only have a few minutes; if I can't resolve it really quickly it may take a day or two.
>
>
> Barry
>
>> On Sep 30, 2015, at 4:10 PM, Matt Landreman <matt.landreman at gmail.com> wrote:
>>
>> Hi Barry,
>> I tried adding PetscMallocDump after SNESDestroy as you suggested. When mumps fails, PetscMallocDump shows a number of mallocs which are absent when mumps succeeds, the largest being MatConvertToTriples_mpiaij_mpiaij() (line 638 in petsc-3.6.0/src/mat/impls/aij/mpi/mumps/mumps.c). The total memory reported by PetscMallocDump after SNESDestroy is substantially (>20x) larger when mumps fails than when mumps succeeds, and this amount increases uniformly with each mumps failure. So I think some of the mumps-related structures are not being deallocated by SNESDestroy if mumps generates an error.
>> Thanks,
>> -Matt
>>
>> On Wed, Sep 30, 2015 at 2:16 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>> On Sep 30, 2015, at 1:06 PM, Matt Landreman <matt.landreman at gmail.com> wrote:
>>>
>>> PETSc developers,
>>>
>>> I tried implementing a system for automatically increasing MUMPS ICNTL(14), along the lines described in this recent thread. If SNESSolve returns ierr .ne. 0 due to MUMPS error -9, I call SNESDestroy, re-initialize SNES, call MatMumpsSetIcntl with a larger value of ICNTL(14), call SNESSolve again, and repeat as needed. The procedure works, but the peak memory required (as measured by the HPC system) is 50%-100% higher if the MUMPS solve has to be repeated compared to when MUMPS works on the 1st try (by starting with a large ICNTL(14)), even though SNESDestroy is called in between the attempts. Are there some PETSc or MUMPS structures which would not be deallocated immediately by SNESDestroy? If so, how do I deallocate them?
>>
>> They should be all destroyed automatically for you. You can use PetscMallocDump() after the SNES is destroyed to verify that all that memory is not properly freed.
>>
>> My guess is that your new malloc() with the bigger workspace cannot "reuse" the space that was previously freed; so to the OS it looks like you are using a lot more space but in terms of physical memory you are not using more.
>>
>> Barry
>>
>>>
>>> Thanks,
>>> Matt Landreman
>>>
>>>
>>> On Tue, Sep 15, 2015 at 7:47 AM, David Knezevic <david.knezevic at akselos.com> wrote:
>>> On Tue, Sep 15, 2015 at 7:29 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>> On Tue, Sep 15, 2015 at 4:30 AM, David Knezevic <david.knezevic at akselos.com> wrote:
>>> In some cases, I get MUMPS error -9, i.e.:
>>> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-9, INFO(2)=98927
>>>
>>> This is easily fixed by re-running the executable with -mat_mumps_icntl_14 on the commandline.
>>>
>>> However, I would like to update my code in order to do this automatically, i.e. detect the -9 error and re-run with the appropriate option. Is there a recommended way to do this? It seems to me that I could do this with a PETSc error handler (e.g. PetscPushErrorHandler) in order to call a function that sets the appropriate option and solves again, is that right? Are there any examples that illustrate this type of thing?
>>>
>>> I would not use the error handler. I would just check the ierr return code from the solver. I think you need the
>>> INFO output, for which you can use MatMumpsGetInfo().
>>>
>>>
>>> OK, that sounds good (and much simpler than what I had in mind), thanks for the help!
>>>
>>> David
>>>
>>>
>>
>>
>
More information about the petsc-users
mailing list