[petsc-users] MatMPIAIJSetPreallocation: "nnz cannot be greater than row length"

David Knezevic david.knezevic at akselos.com
Wed Feb 25 09:47:18 CST 2015


Hi Dmitry,

Thanks so much, that's great! I've updated the pull request now to include
your patch:
https://github.com/dknez/libmesh/commit/247cc12f7536a5f848fd5c63765d97af307d9fa7
We can update the code later, once MatReset() is available in PETSc.

Regarding the convergence issues: I think the main issue is that here we're
using the penalty method for contact, which gives a non-smooth term in the
PDE, and hence we don't get nice convergence with the nonlinear solver. In
particular, the SNES solver terminates with "DIVERGED_LINE_SEARCH", but I
don't think that's very surprising here. The simulation results seem to be
good though (e.g. see the screenshots in the PR). I'd be interested in your
thoughts on this though?

I'm not sure why the KSP isn't converging fully. I didn't put any thought
into the choice of default solver, happy to change that (to CG or
whatever). But I also used a direct solver for comparison, and SuperLU
gives the same overall result that I get with GMRES (though MUMPS crashes
with "Numerically singular matrix" for some reason).

Regarding the VI solver: That sounds very good, I'll be interested to try
the overhauled VI solver out for contact problems, once it's available.

Thanks,
David



On Wed, Feb 25, 2015 at 9:56 AM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
wrote:

> The previous patch didn't clean out the Mat data structure thoroughly
> enough. Please try the attached patch.  It runs for me, but there is no
> convergence.  In particular, KSP doesn't seem to converge, although that's
> with the default GMRES+BJACOBI/ILU(0) options. So I don't think resetting
> the Mat is the cause.
>
> Let me know if this works for you. MatReset() will be in the next release
> as well as, hopefully, an overhauled VI solver that should be able to
> handle these contact problems.
>
> Dmitry.
>
> On Mon Feb 23 2015 at 10:17:19 AM David Knezevic <
> david.knezevic at akselos.com> wrote:
>
>> OK, sounds good. Let me know if I can help with the digging.
>>
>> David
>>
>>
>>
>> On Mon, Feb 23, 2015 at 11:10 AM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
>> wrote:
>>
>>> I just tried building against petsc/master, but there needs to be more
>>> work on libMesh before it can work with petsc/master:
>>> the new VecLockPush()/Pop() stuff isn't respected by vector manipulation
>>> in libMesh.
>>> I put a hack equivalent to MatReset() into your branch (patch attached,
>>> in case you want to take a look at it),
>>> but it generates the same error in MatCreateColmap that you reported
>>> earlier.  It's odd that it occurs
>>> on the second nonlinear iteration.   I'll have to dig a bit deeper  to
>>> see what's going on.
>>>
>>> Dmitry.
>>>
>>>
>>> On Mon Feb 23 2015 at 10:03:33 AM David Knezevic <
>>> david.knezevic at akselos.com> wrote:
>>>
>>>> Hi Dmitry,
>>>>
>>>> OK, good to hear we're seeing the same behavior for the example.
>>>>
>>>> Regarding this comment:
>>>>
>>>>
>>>> libMesh needs to change the way configure extracts PETSc information --
>>>>> configuration data were moved:
>>>>> conf --> lib/petsc-conf
>>>>> ${PETSC_ARCH}/conf --> ${PETSC_ARCH}/lib/petsc-conf
>>>>>
>>>>> At one point I started looking at m4/petsc.m4, but that got put on the
>>>>> back burner.  For now making the relevant symlinks by hand lets you
>>>>> configure and build libMesh with petsc/master.
>>>>>
>>>>
>>>>
>>>> So do you suggest that the next step here is to build libmesh against
>>>> petsc/master so that we can try your PETSc pull request that implements
>>>> MatReset() to see if that gets this example working?
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> On Mon Feb 23 2015 at 9:15:44 AM David Knezevic <
>>>>> david.knezevic at akselos.com> wrote:
>>>>>
>>>>>> Hi Dmitry,
>>>>>>
>>>>>> Thanks very much for testing out the example.
>>>>>>
>>>>>> examples/systems_of_equations/ex8 works fine for me in serial, but
>>>>>> it fails for me if I run with more than 1 MPI process. Can you try it with,
>>>>>> say, 2 or 4 MPI processes?
>>>>>>
>>>>>> If we need access to MatReset in libMesh to get this to work, I'll be
>>>>>> happy to work on a libMesh pull request for that.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> David J. Knezevic | CTO
>>>>>> Akselos | 17 Bay State Road | Boston, MA | 02215
>>>>>> Phone (office): +1-857-265-2238
>>>>>> Phone (mobile): +1-617-599-4755
>>>>>> Web: http://www.akselos.com
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 23, 2015 at 10:08 AM, Dmitry Karpeyev <
>>>>>> karpeev at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> David,
>>>>>>>
>>>>>>> What code are you running when you encounter this error?  I'm trying
>>>>>>> to reproduce it and
>>>>>>> I tried examples/systems_of_equations/ex8, but it ran for me,
>>>>>>> ostensibly to completion.
>>>>>>>
>>>>>>> I have a small PETSc pull request that implements MatReset(), which
>>>>>>> passes a small PETSc test,
>>>>>>> but libMesh needs some work to be able to build against petsc/master
>>>>>>> because of some recent
>>>>>>> changes to PETSc.
>>>>>>>
>>>>>>> Dmitry.
>>>>>>>
>>>>>>> On Mon Feb 23 2015 at 7:17:06 AM David Knezevic <
>>>>>>> david.knezevic at akselos.com> wrote:
>>>>>>>
>>>>>>>> Hi Barry, hi Dmitry,
>>>>>>>>
>>>>>>>> I set the matrix to BAIJ and back to AIJ, and the code got a bit
>>>>>>>> further. But I now run into the error pasted below (Note that I'm now using
>>>>>>>> "--with-debugging=1"):
>>>>>>>>
>>>>>>>> PETSC ERROR: --------------------- Error Message
>>>>>>>> --------------------------------------------------------------
>>>>>>>> PETSC ERROR: Petsc has generated inconsistent data
>>>>>>>> PETSC ERROR: MPIAIJ Matrix was assembled but is missing garray
>>>>>>>> PETSC ERROR: See http://www.mcs.anl.gov/petsc/
>>>>>>>> documentation/faq.html for trouble shooting.
>>>>>>>> PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>>>>> PETSC ERROR: ./example-dbg on a arch-linux2-c-debug named
>>>>>>>> david-Lenovo by dknez Mon Feb 23 08:05:44 2015
>>>>>>>> PETSC ERROR: Configure options --with-shared-libraries=1
>>>>>>>> --with-debugging=1 --download-suitesparse=1 --download-parmetis=1
>>>>>>>> --download-blacs=1 --download-scalapack=1 --download-mumps=1
>>>>>>>> --download-metis --download-superlu_dist --prefix=/home/dknez/software/libmesh_install/dbg_real/petsc
>>>>>>>> --download-hypre
>>>>>>>> PETSC ERROR: #1 MatCreateColmap_MPIAIJ_Private() line 361 in
>>>>>>>> /home/dknez/software/petsc-3.5.2/src/mat/impls/aij/mpi/mpiaij.c
>>>>>>>> PETSC ERROR: #2 MatSetValues_MPIAIJ() line 538 in
>>>>>>>> /home/dknez/software/petsc-3.5.2/src/mat/impls/aij/mpi/mpiaij.c
>>>>>>>> PETSC ERROR: #3 MatSetValues() line 1136 in
>>>>>>>> /home/dknez/software/petsc-3.5.2/src/mat/interface/matrix.c
>>>>>>>> PETSC ERROR: #4 add_matrix() line 765 in
>>>>>>>> /home/dknez/software/libmesh-src/src/numerics/petsc_matrix.C
>>>>>>>> ------------------------------------------------------------
>>>>>>>> --------------
>>>>>>>>
>>>>>>>> This occurs when I try to set some entries of the matrix. Do you
>>>>>>>> have any suggestions on how I can resolve this?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Feb 22, 2015 at 10:22 PM, Dmitry Karpeyev <
>>>>>>>> dkarpeev at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun Feb 22 2015 at 9:15:22 PM Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> > On Feb 22, 2015, at 9:09 PM, David Knezevic <
>>>>>>>>>> david.knezevic at akselos.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi Dmitry,
>>>>>>>>>> >
>>>>>>>>>> > Thanks for the suggestion. I tried MatSetType(mat,MATMPIAIJ)
>>>>>>>>>> followed by MatXAIJSetPreallocation(...), but unfortunately this still
>>>>>>>>>> gives me the same error as before: "nnz cannot be greater than row length:
>>>>>>>>>> local row 168 value 24 rowlength 0".
>>>>>>>>>> >
>>>>>>>>>> > I gather that the idea here is that MatSetType builds a new
>>>>>>>>>> matrix object, and then I should be able to pre-allocate for that new
>>>>>>>>>> matrix however I like, right? Was I supposed to clear the matrix object
>>>>>>>>>> somehow before calling MatSetType? (I didn't do any sort of clear
>>>>>>>>>> operation.)
>>>>>>>>>>
>>>>>>>>>>   If the type doesn't change then MatSetType() won't do anything.
>>>>>>>>>> You can try setting the type to BAIJ and then setting the type back to AIJ.
>>>>>>>>>> This may/should clear out the matrix.
>>>>>>>>>>
>>>>>>>>> Ah, yes.  If the type is the same as before it does quit early,
>>>>>>>>> but changing the type and then back will clear out and rebuild the matrix.
>>>>>>>>>   We need
>>>>>>>>> something like MatReset() to do the equivalent thing.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > As I said earlier, I'll make a dbg PETSc build, so hopefully
>>>>>>>>>> that will help shed some light on what's going wrong for me.
>>>>>>>>>>
>>>>>>>>> I think it's always a good idea to have a dbg build of PETSc when
>>>>>>>>> you doing things like these.
>>>>>>>>>
>>>>>>>>> Dmitry.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Don't bother, what I suggested won't work.
>>>>>>>>>>
>>>>>>>>>>   Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > David
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Sun, Feb 22, 2015 at 6:02 PM, Dmitry Karpeyev <
>>>>>>>>>> dkarpeev at gmail.com> wrote:
>>>>>>>>>> > David,
>>>>>>>>>> > It might be easier to just rebuild the whole matrix from
>>>>>>>>>> scratch: you would in effect be doing all that with disassembling and
>>>>>>>>>> resetting the preallocation.
>>>>>>>>>> > MatSetType(mat,MATMPIAIJ)
>>>>>>>>>> > or
>>>>>>>>>> > PetscObjectGetType((PetscObject)mat,&type);
>>>>>>>>>> > MatSetType(mat,type);
>>>>>>>>>> > followed by
>>>>>>>>>> > MatXAIJSetPreallocation(...);
>>>>>>>>>> > should do.
>>>>>>>>>> > Dmitry.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Sun Feb 22 2015 at 4:45:46 PM Barry Smith <
>>>>>>>>>> bsmith at mcs.anl.gov> wrote:
>>>>>>>>>> >
>>>>>>>>>> >  Do not call for SeqAIJ matrix. Do not call before the first
>>>>>>>>>> time you have preallocated and put entries in the matrix and done the
>>>>>>>>>> MatAssemblyBegin/End()
>>>>>>>>>> >
>>>>>>>>>> >   If it still crashes you'll need to try the debugger
>>>>>>>>>> >
>>>>>>>>>> >   Barry
>>>>>>>>>> >
>>>>>>>>>> > > On Feb 22, 2015, at 4:09 PM, David Knezevic <
>>>>>>>>>> david.knezevic at akselos.com> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > Hi Barry,
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks for your help, much appreciated.
>>>>>>>>>> > >
>>>>>>>>>> > > I added a prototype for MatDisAssemble_MPIAIJ:
>>>>>>>>>> > > PETSC_INTERN PetscErrorCode MatDisAssemble_MPIAIJ(Mat);
>>>>>>>>>> > >
>>>>>>>>>> > > and I added a call to MatDisAssemble_MPIAIJ before
>>>>>>>>>> MatMPIAIJSetPreallocation. However, I get a segfault on the call to
>>>>>>>>>> MatDisAssemble_MPIAIJ. The segfault occurs in both serial and parallel.
>>>>>>>>>> > >
>>>>>>>>>> > > FYI, I'm using Petsc 3.5.2, and I'm not using a non-debug
>>>>>>>>>> build (though I could rebuild PETSc in debug mode if you think that would
>>>>>>>>>> help figure out what's happening here).
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > > David
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > On Sun, Feb 22, 2015 at 1:13 PM, Barry Smith <
>>>>>>>>>> bsmith at mcs.anl.gov> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > >   David,
>>>>>>>>>> > >
>>>>>>>>>> > >    This is an obscure little feature of MatMPIAIJ,   each
>>>>>>>>>> time you change the sparsity pattern before you call the
>>>>>>>>>> MatMPIAIJSetPreallocation you need to call  MatDisAssemble_MPIAIJ(Mat
>>>>>>>>>> mat).    This is a private PETSc function so you need to provide your own
>>>>>>>>>> prototype for it above the function you use it in.
>>>>>>>>>> > >
>>>>>>>>>> > >   Let us know if this resolves the problem.
>>>>>>>>>> > >
>>>>>>>>>> > >    Barry
>>>>>>>>>> > >
>>>>>>>>>> > > We never really intended that people would call
>>>>>>>>>> MatMPIAIJSetPreallocation() AFTER they had already used the matrix.
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > > On Feb 22, 2015, at 6:50 AM, David Knezevic <
>>>>>>>>>> david.knezevic at akselos.com> wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > > Hi all,
>>>>>>>>>> > > >
>>>>>>>>>> > > > I've implemented a solver for a contact problem using SNES.
>>>>>>>>>> The sparsity pattern of the jacobian matrix needs to change at each
>>>>>>>>>> nonlinear iteration (because the elements which are in contact can change),
>>>>>>>>>> so I tried to deal with this by calling MatSeqAIJSetPreallocation and
>>>>>>>>>> MatMPIAIJSetPreallocation during each iteration in order to update the
>>>>>>>>>> preallocation.
>>>>>>>>>> > > >
>>>>>>>>>> > > > This seems to work fine in serial, but with two or more MPI
>>>>>>>>>> processes I run into the error "nnz cannot be greater than row length",
>>>>>>>>>> e.g.:
>>>>>>>>>> > > > nnz cannot be greater than row length: local row 528 value
>>>>>>>>>> 12 rowlength 0
>>>>>>>>>> > > >
>>>>>>>>>> > > > This error is from the call to
>>>>>>>>>> > > > MatSeqAIJSetPreallocation(b->B,o_nz,o_nnz); in
>>>>>>>>>> MatMPIAIJSetPreallocation_MPIAIJ.
>>>>>>>>>> > > >
>>>>>>>>>> > > > Any guidance on what the problem might be would be most
>>>>>>>>>> appreciated. For example, I was wondering if there is a problem with
>>>>>>>>>> calling SetPreallocation on a matrix that has already been preallocated?
>>>>>>>>>> > > >
>>>>>>>>>> > > > Some notes:
>>>>>>>>>> > > > - I'm using PETSc via libMesh
>>>>>>>>>> > > > - The code that triggers this issue is available as a PR on
>>>>>>>>>> the libMesh github repo, in case anyone is interested:
>>>>>>>>>> https://github.com/libMesh/libmesh/pull/460/
>>>>>>>>>> > > > - I can try to make a minimal pure-PETSc example that
>>>>>>>>>> reproduces this error, if that would be helpful.
>>>>>>>>>> > > >
>>>>>>>>>> > > > Many thanks,
>>>>>>>>>> > > > David
>>>>>>>>>> > > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150225/7528c9cf/attachment-0001.html>


More information about the petsc-users mailing list