[petsc-users] KSPSetUp with PETSc/MUMPS

Hong hzhang at mcs.anl.gov
Fri May 27 09:55:24 CDT 2016


Satish,
I tested your fix on ex51f.F90 (modified from
build_nullbasis_petsc_mumps.F90) --it gives clean results with valgrind.

Shall you patch it to petsc-maint?

I also like add ex51f.F90 (contributed by Constantin)
 to petsc/src/ksp/ksp/examples/tests/.

Hong


On Thu, May 26, 2016 at 5:15 PM, Hong <hzhang at mcs.anl.gov> wrote:

> Satish found a problem in using inode routines.
>
> In addition, user code has bugs. I modified
> build_nullbasis_petsc_mumps.F90 into ex51f.F90 (attached)
> which works well with option '-mat_no_inode'.
>
> ex51f.F90 differs from build_nullbasis_petsc_mumps.F90 in
> 1) use MATAIJ/MATDENSE instead of MATMPIAIJ/MATMPIDENSE
> MATAIJ wraps MATSEQAIJ and MATMPIAIJ.
>
> 2)
> MatConvert(x, MATMPIAIJ, MAT_REUSE_MATRIX, x,ierr)
> ->
> MatConvert(x, MATMPIAIJ, MAT_INPLACE_MATRIX, x,ierr)
> see
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatConvert.html
>
> Hong
>
> On Thu, May 26, 2016 at 3:05 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
>> Well looks like MatGetBrowsOfAoCols_MPIAIJ() issue is primarily
>> setting some local variables with uninitialzed data [thats primarily
>> set/used for parallel commumication]. So valgrind flags it - but I
>> don't think it gets used later on.
>>
>> [perhaps most of the code should be skipped for a sequential run..]
>>
>> The primary issue here is MatGetRowIJ_SeqAIJ_Inode_Symmetric() called
>> by MatGetOrdering_ND().
>>
>> The workarround is to not use ND with:
>>    call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr)
>>
>> But I think the following might be the fix [have to recheck].. The
>> test code works with this change [with the default ND]
>>
>> diff --git a/src/mat/impls/aij/seq/inode.c b/src/mat/impls/aij/seq/inode.c
>> index 9af404e..49f76ce 100644
>> --- a/src/mat/impls/aij/seq/inode.c
>> +++ b/src/mat/impls/aij/seq/inode.c
>> @@ -97,6 +97,7 @@ static PetscErrorCode
>> MatGetRowIJ_SeqAIJ_Inode_Symmetric(Mat A,const PetscInt *i
>>
>>      j    = aj + ai[row] + ishift;
>>      jmax = aj + ai[row+1] + ishift;
>> +    if (j==jmax) continue; /* empty row */
>>      col  = *j++ + ishift;
>>      i2   = tvc[col];
>>      while (i2<i1 && j<jmax) { /* 1.[-xx-d-xx--]
>> 2.[-xx-------],off-diagonal elemets */
>> @@ -125,6 +126,7 @@ static PetscErrorCode
>> MatGetRowIJ_SeqAIJ_Inode_Symmetric(Mat A,const PetscInt *i
>>    for (i1=0,row=0; i1<nslim_row; row += ns_row[i1],i1++) {
>>      j    = aj + ai[row] + ishift;
>>      jmax = aj + ai[row+1] + ishift;
>> +    if (j==jmax) continue; /* empty row */
>>      col  = *j++ + ishift;
>>      i2   = tvc[col];
>>      while (i2<i1 && j<jmax) {
>>
>> Satish
>>
>> On Thu, 26 May 2016, Hong wrote:
>>
>> > I'll investigate this - had a day off since yesterday.
>> > Hong
>> >
>> > On Thu, May 26, 2016 at 12:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > >
>> > >   Hong needs to run with this matrix and add appropriate error
>> checkers in
>> > > the matrix routines to detect "incomplete" matrices and likely just
>> error
>> > > out.
>> > >
>> > >    Barry
>> > >
>> > > > On May 26, 2016, at 11:23 AM, Satish Balay <balay at mcs.anl.gov>
>> wrote:
>> > > >
>> > > > Mat Object: 1 MPI processes
>> > > >  type: mpiaij
>> > > > row 0: (0, 0.)  (1, 0.486111)
>> > > > row 1: (0, 0.486111)  (1, 0.)
>> > > > row 2: (2, 0.)  (3, 0.486111)
>> > > > row 3: (4, 0.486111)  (5, -0.486111)
>> > > > row 4:
>> > > > row 5:
>> > > >
>> > > > The matrix created is funny (empty rows at the end) - so perhaps its
>> > > > exposing bugs in Mat code? [is that a valid matrix for this code?]
>> > > >
>> > > > ==21091== Use of uninitialised value of size 8
>> > > > ==21091==    at 0x57CA16B: MatGetRowIJ_SeqAIJ_Inode_Symmetric
>> > > (inode.c:101)
>> > > > ==21091==    by 0x57CBA1C: MatGetRowIJ_SeqAIJ_Inode (inode.c:241)
>> > > > ==21091==    by 0x537C0B5: MatGetRowIJ (matrix.c:7274)
>> > > > ==21091==    by 0x53072FD: MatGetOrdering_ND (spnd.c:18)
>> > > > ==21091==    by 0x530BC39: MatGetOrdering (sorder.c:260)
>> > > > ==21091==    by 0x530A72D: MatGetOrdering (sorder.c:202)
>> > > > ==21091==    by 0x5DDD764: PCSetUp_LU (lu.c:124)
>> > > > ==21091==    by 0x5EBFE60: PCSetUp (precon.c:968)
>> > > > ==21091==    by 0x5FDA1B3: KSPSetUp (itfunc.c:390)
>> > > > ==21091==    by 0x601C17D: kspsetup_ (itfuncf.c:252)
>> > > > ==21091==    by 0x4028B9: MAIN__ (ex1f.F90:104)
>> > > > ==21091==    by 0x403535: main (ex1f.F90:185)
>> > > >
>> > > >
>> > > > This goes away if  I add:
>> > > >
>> > > >   call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr)
>> > > >
>> > > > And then there is also:
>> > > >
>> > > > ==21275== Invalid read of size 8
>> > > > ==21275==    at 0x584DE93: MatGetBrowsOfAoCols_MPIAIJ
>> (mpiaij.c:4734)
>> > > > ==21275==    by 0x58970A8:
>> MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable
>> > > (mpimatmatmult.c:198)
>> > > > ==21275==    by 0x5894A54: MatMatMult_MPIAIJ_MPIAIJ
>> (mpimatmatmult.c:34)
>> > > > ==21275==    by 0x539664E: MatMatMult (matrix.c:9510)
>> > > > ==21275==    by 0x53B3201: matmatmult_ (matrixf.c:1157)
>> > > > ==21275==    by 0x402FC9: MAIN__ (ex1f.F90:149)
>> > > > ==21275==    by 0x4035B9: main (ex1f.F90:186)
>> > > > ==21275==  Address 0xa3d20f0 is 0 bytes after a block of size 48
>> alloc'd
>> > > > ==21275==    at 0x4C2DF93: memalign (vg_replace_malloc.c:858)
>> > > > ==21275==    by 0x4FDE05E: PetscMallocAlign (mal.c:28)
>> > > > ==21275==    by 0x5240240: VecScatterCreate (vscat.c:1220)
>> > > > ==21275==    by 0x5857708: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
>> > > > ==21275==    by 0x581C31E: MatAssemblyEnd_MPIAIJ (mpiaij.c:747)
>> > > > ==21275==    by 0x53680F2: MatAssemblyEnd (matrix.c:5187)
>> > > > ==21275==    by 0x53B24D2: matassemblyend_ (matrixf.c:926)
>> > > > ==21275==    by 0x40262C: MAIN__ (ex1f.F90:60)
>> > > > ==21275==    by 0x4035B9: main (ex1f.F90:186)
>> > > >
>> > > >
>> > > > Satish
>> > > >
>> > > > -----------
>> > > >
>> > > > $ diff build_nullbasis_petsc_mumps.F90 ex1f.F90
>> > > > 3,7c3
>> > > > < #include <petsc/finclude/petscsys.h>
>> > > > < #include "petsc/finclude/petscvec.h"
>> > > > < #include "petsc/finclude/petscmat.h"
>> > > > < #include "petsc/finclude/petscpc.h"
>> > > > < #include "petsc/finclude/petscksp.h"
>> > > > ---
>> > > >> #include "petsc/finclude/petsc.h"
>> > > > 40,41c36,37
>> > > > <    call PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat_c_bin.txt",
>> 0,
>> > > viewer, ierr)
>> > > > <    call MatLoad(mat_c, viewer)
>> > > > ---
>> > > >>   call PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat_c_bin.txt",
>> > > FILE_MODE_READ, viewer, ierr)
>> > > >>   call MatLoad(mat_c, viewer,ierr)
>> > > > 75a72
>> > > >>   call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr)
>> > > > 150c147
>> > > > <    call MatConvert(x, MATMPIAIJ, MAT_REUSE_MATRIX, x, ierr)
>> > > > ---
>> > > >>   call MatConvert(x, MATMPIAIJ, MAT_INPLACE_MATRIX, x, ierr)
>> > > >
>> > > >
>> > > > On Thu, 26 May 2016, Matthew Knepley wrote:
>> > > >
>> > > >> Usually this means you have an uninitialized variable that is
>> causing
>> > > you
>> > > >> to overwrite memory. Fortran
>> > > >> is so lax in checking this, its one reason to switch to C.
>> > > >>
>> > > >>  Thanks,
>> > > >>
>> > > >>    Matt
>> > > >>
>> > > >> On Thu, May 26, 2016 at 1:46 AM, Constantin Nguyen Van <
>> > > >> constantin.nguyen.van at openmailbox.org> wrote:
>> > > >>
>> > > >>> Thanks for all your answers.
>> > > >>> I'm sorry for the syntax mistake in MatLoad, it was done
>> afterwards.
>> > > >>>
>> > > >>> I recompile PETSC --with-debugging=yes and run my code again.
>> > > >>> Now, I also have this strange behaviour. When I run the code
>> without
>> > > >>> valgrind and with one proc, I have this error message:
>> > > >>>
>> > > >>> BEGIN PROC           0
>> > > >>> ITERATION           1
>> > > >>> ECHO 1
>> > > >>> ECHO 2
>> > > >>> INFOG(28):           2
>> > > >>> BASIS OK           0
>> > > >>> END PROC             0
>> > > >>> BEGIN PROC           0
>> > > >>> ITERATION           2
>> > > >>> ECHO 1
>> > > >>> ECHO 2
>> > > >>> INFOG(28):           2
>> > > >>> BASIS OK           0
>> > > >>> END PROC             0
>> > > >>> BEGIN PROC           0
>> > > >>> ITERATION           3
>> > > >>> ECHO 1
>> > > >>> [0]PETSC ERROR:
>> > > >>>
>> > >
>> ------------------------------------------------------------------------
>> > > >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>> Violation,
>> > > >>> probably memory access out of range
>> > > >>> [0]PETSC ERROR: Try option -start_in_debugger or
>> > > -on_error_attach_debugger
>> > > >>> [0]PETSC ERROR: or see
>> > > >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> > > >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and
>> Apple Mac
>> > > OS
>> > > >>> X to find memory corruption errors
>> > > >>> [0]PETSC ERROR: likely location of problem given in stack below
>> > > >>> [0]PETSC ERROR: ---------------------  Stack Frames
>> > > >>> ------------------------------------
>> > > >>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>> > > >>> available,
>> > > >>> [0]PETSC ERROR:       INSTEAD the line number of the start of the
>> > > function
>> > > >>> [0]PETSC ERROR:       is given.
>> > > >>> [0]PETSC ERROR: [0] MatGetRowIJ_SeqAIJ_Inode_Symmetric line 69
>> > > >>>
>> > >
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/impls/aij/seq/inode.c
>> > > >>> [0]PETSC ERROR: [0] MatGetRowIJ_SeqAIJ_Inode line 235
>> > > >>>
>> > >
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/impls/aij/seq/inode.c
>> > > >>> [0]PETSC ERROR: [0] MatGetRowIJ line 7099
>> > > >>>
>> > >
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/interface/matrix.c
>> > > >>> [0]PETSC ERROR: [0] MatGetOrdering_ND line 17
>> > > >>>
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/spnd.c
>> > > >>> [0]PETSC ERROR: [0] MatGetOrdering line 185
>> > > >>>
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/sorder.c
>> > > >>> [0]PETSC ERROR: [0] MatGetOrdering line 185
>> > > >>>
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/sorder.c
>> > > >>> [0]PETSC ERROR: [0] PCSetUp_LU line 99
>> > > >>>
>> > >
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/pc/impls/factor/lu/lu.c
>> > > >>> [0]PETSC ERROR: [0] PCSetUp line 945
>> > > >>>
>> > >
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/pc/interface/precon.c
>> > > >>> [0]PETSC ERROR: [0] KSPSetUp line 247
>> > > >>>
>> > >
>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/ksp/interface/itfunc.c
>> > > >>>
>> > > >>> But when I run it with valgrind, it does work well.
>> > > >>>
>> > > >>> Le 2016-05-25 20:04, Barry Smith a écrit :
>> > > >>>
>> > > >>>> First run with valgrind
>> > > >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> > > >>>>
>> > > >>>> On May 25, 2016, at 2:35 AM, Constantin Nguyen Van
>> > > >>>>> <constantin.nguyen.van at openmailbox.org> wrote:
>> > > >>>>>
>> > > >>>>> Hi,
>> > > >>>>>
>> > > >>>>> I'm a new user of PETSc and I try to use it with MUMPS
>> > > >>>>> functionalities to compute a nullbasis.
>> > > >>>>> I wrote a code where I compute 4 times the same nullbasis. It
>> does
>> > > >>>>> work well when I run it with several procs but with only one
>> > > >>>>> processor I get an error on the 2nd iteration when KSPSetUp is
>> > > >>>>> called. Furthermore when it is run with a debugger (
>> > > >>>>> --with-debugging=yes), it works fine with one or several
>> processors.
>> > > >>>>> Have you got any idea about why it doesn't work with one
>> processor
>> > > >>>>> and no debugger?
>> > > >>>>>
>> > > >>>>> Thanks.
>> > > >>>>> Constantin.
>> > > >>>>>
>> > > >>>>> PS: You can find the code and the files required to run it
>> enclosed.
>> > > >>>>>
>> > > >>>>
>> > > >>
>> > > >>
>> > >
>> > >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160527/bfe9e106/attachment-0001.html>


More information about the petsc-users mailing list