<div dir="ltr">I'll investigate this - had a day off since yesterday.<div>Hong</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 26, 2016 at 12:04 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Hong needs to run with this matrix and add appropriate error checkers in the matrix routines to detect "incomplete" matrices and likely just error out.<br>
<span class="HOEnZb"><font color="#888888"><br>
Barry<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
> On May 26, 2016, at 11:23 AM, Satish Balay <<a href="mailto:balay@mcs.anl.gov">balay@mcs.anl.gov</a>> wrote:<br>
><br>
> Mat Object: 1 MPI processes<br>
> type: mpiaij<br>
> row 0: (0, 0.) (1, 0.486111)<br>
> row 1: (0, 0.486111) (1, 0.)<br>
> row 2: (2, 0.) (3, 0.486111)<br>
> row 3: (4, 0.486111) (5, -0.486111)<br>
> row 4:<br>
> row 5:<br>
><br>
> The matrix created is funny (empty rows at the end) - so perhaps its<br>
> exposing bugs in Mat code? [is that a valid matrix for this code?]<br>
><br>
> ==21091== Use of uninitialised value of size 8<br>
> ==21091== at 0x57CA16B: MatGetRowIJ_SeqAIJ_Inode_Symmetric (inode.c:101)<br>
> ==21091== by 0x57CBA1C: MatGetRowIJ_SeqAIJ_Inode (inode.c:241)<br>
> ==21091== by 0x537C0B5: MatGetRowIJ (matrix.c:7274)<br>
> ==21091== by 0x53072FD: MatGetOrdering_ND (spnd.c:18)<br>
> ==21091== by 0x530BC39: MatGetOrdering (sorder.c:260)<br>
> ==21091== by 0x530A72D: MatGetOrdering (sorder.c:202)<br>
> ==21091== by 0x5DDD764: PCSetUp_LU (lu.c:124)<br>
> ==21091== by 0x5EBFE60: PCSetUp (precon.c:968)<br>
> ==21091== by 0x5FDA1B3: KSPSetUp (itfunc.c:390)<br>
> ==21091== by 0x601C17D: kspsetup_ (itfuncf.c:252)<br>
> ==21091== by 0x4028B9: MAIN__ (ex1f.F90:104)<br>
> ==21091== by 0x403535: main (ex1f.F90:185)<br>
><br>
><br>
> This goes away if I add:<br>
><br>
> call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr)<br>
><br>
> And then there is also:<br>
><br>
> ==21275== Invalid read of size 8<br>
> ==21275== at 0x584DE93: MatGetBrowsOfAoCols_MPIAIJ (mpiaij.c:4734)<br>
> ==21275== by 0x58970A8: MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable (mpimatmatmult.c:198)<br>
> ==21275== by 0x5894A54: MatMatMult_MPIAIJ_MPIAIJ (mpimatmatmult.c:34)<br>
> ==21275== by 0x539664E: MatMatMult (matrix.c:9510)<br>
> ==21275== by 0x53B3201: matmatmult_ (matrixf.c:1157)<br>
> ==21275== by 0x402FC9: MAIN__ (ex1f.F90:149)<br>
> ==21275== by 0x4035B9: main (ex1f.F90:186)<br>
> ==21275== Address 0xa3d20f0 is 0 bytes after a block of size 48 alloc'd<br>
> ==21275== at 0x4C2DF93: memalign (vg_replace_malloc.c:858)<br>
> ==21275== by 0x4FDE05E: PetscMallocAlign (mal.c:28)<br>
> ==21275== by 0x5240240: VecScatterCreate (vscat.c:1220)<br>
> ==21275== by 0x5857708: MatSetUpMultiply_MPIAIJ (mmaij.c:116)<br>
> ==21275== by 0x581C31E: MatAssemblyEnd_MPIAIJ (mpiaij.c:747)<br>
> ==21275== by 0x53680F2: MatAssemblyEnd (matrix.c:5187)<br>
> ==21275== by 0x53B24D2: matassemblyend_ (matrixf.c:926)<br>
> ==21275== by 0x40262C: MAIN__ (ex1f.F90:60)<br>
> ==21275== by 0x4035B9: main (ex1f.F90:186)<br>
><br>
><br>
> Satish<br>
><br>
> -----------<br>
><br>
> $ diff build_nullbasis_petsc_mumps.F90 ex1f.F90<br>
> 3,7c3<br>
> < #include <petsc/finclude/petscsys.h><br>
> < #include "petsc/finclude/petscvec.h"<br>
> < #include "petsc/finclude/petscmat.h"<br>
> < #include "petsc/finclude/petscpc.h"<br>
> < #include "petsc/finclude/petscksp.h"<br>
> ---<br>
>> #include "petsc/finclude/petsc.h"<br>
> 40,41c36,37<br>
> < call PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat_c_bin.txt", 0, viewer, ierr)<br>
> < call MatLoad(mat_c, viewer)<br>
> ---<br>
>> call PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat_c_bin.txt", FILE_MODE_READ, viewer, ierr)<br>
>> call MatLoad(mat_c, viewer,ierr)<br>
> 75a72<br>
>> call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr)<br>
> 150c147<br>
> < call MatConvert(x, MATMPIAIJ, MAT_REUSE_MATRIX, x, ierr)<br>
> ---<br>
>> call MatConvert(x, MATMPIAIJ, MAT_INPLACE_MATRIX, x, ierr)<br>
><br>
><br>
> On Thu, 26 May 2016, Matthew Knepley wrote:<br>
><br>
>> Usually this means you have an uninitialized variable that is causing you<br>
>> to overwrite memory. Fortran<br>
>> is so lax in checking this, its one reason to switch to C.<br>
>><br>
>> Thanks,<br>
>><br>
>> Matt<br>
>><br>
>> On Thu, May 26, 2016 at 1:46 AM, Constantin Nguyen Van <<br>
>> <a href="mailto:constantin.nguyen.van@openmailbox.org">constantin.nguyen.van@openmailbox.org</a>> wrote:<br>
>><br>
>>> Thanks for all your answers.<br>
>>> I'm sorry for the syntax mistake in MatLoad, it was done afterwards.<br>
>>><br>
>>> I recompile PETSC --with-debugging=yes and run my code again.<br>
>>> Now, I also have this strange behaviour. When I run the code without<br>
>>> valgrind and with one proc, I have this error message:<br>
>>><br>
>>> BEGIN PROC 0<br>
>>> ITERATION 1<br>
>>> ECHO 1<br>
>>> ECHO 2<br>
>>> INFOG(28): 2<br>
>>> BASIS OK 0<br>
>>> END PROC 0<br>
>>> BEGIN PROC 0<br>
>>> ITERATION 2<br>
>>> ECHO 1<br>
>>> ECHO 2<br>
>>> INFOG(28): 2<br>
>>> BASIS OK 0<br>
>>> END PROC 0<br>
>>> BEGIN PROC 0<br>
>>> ITERATION 3<br>
>>> ECHO 1<br>
>>> [0]PETSC ERROR:<br>
>>> ------------------------------------------------------------------------<br>
>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,<br>
>>> probably memory access out of range<br>
>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>
>>> [0]PETSC ERROR: or see<br>
>>> <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br>
>>> [0]PETSC ERROR: or try <a href="http://valgrind.org" rel="noreferrer" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS<br>
>>> X to find memory corruption errors<br>
>>> [0]PETSC ERROR: likely location of problem given in stack below<br>
>>> [0]PETSC ERROR: --------------------- Stack Frames<br>
>>> ------------------------------------<br>
>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not<br>
>>> available,<br>
>>> [0]PETSC ERROR: INSTEAD the line number of the start of the function<br>
>>> [0]PETSC ERROR: is given.<br>
>>> [0]PETSC ERROR: [0] MatGetRowIJ_SeqAIJ_Inode_Symmetric line 69<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/impls/aij/seq/inode.c<br>
>>> [0]PETSC ERROR: [0] MatGetRowIJ_SeqAIJ_Inode line 235<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/impls/aij/seq/inode.c<br>
>>> [0]PETSC ERROR: [0] MatGetRowIJ line 7099<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/interface/matrix.c<br>
>>> [0]PETSC ERROR: [0] MatGetOrdering_ND line 17<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/spnd.c<br>
>>> [0]PETSC ERROR: [0] MatGetOrdering line 185<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/sorder.c<br>
>>> [0]PETSC ERROR: [0] MatGetOrdering line 185<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/sorder.c<br>
>>> [0]PETSC ERROR: [0] PCSetUp_LU line 99<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/pc/impls/factor/lu/lu.c<br>
>>> [0]PETSC ERROR: [0] PCSetUp line 945<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/pc/interface/precon.c<br>
>>> [0]PETSC ERROR: [0] KSPSetUp line 247<br>
>>> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/ksp/interface/itfunc.c<br>
>>><br>
>>> But when I run it with valgrind, it does work well.<br>
>>><br>
>>> Le 2016-05-25 20:04, Barry Smith a écrit :<br>
>>><br>
>>>> First run with valgrind<br>
>>>> <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br>
>>>><br>
>>>> On May 25, 2016, at 2:35 AM, Constantin Nguyen Van<br>
>>>>> <<a href="mailto:constantin.nguyen.van@openmailbox.org">constantin.nguyen.van@openmailbox.org</a>> wrote:<br>
>>>>><br>
>>>>> Hi,<br>
>>>>><br>
>>>>> I'm a new user of PETSc and I try to use it with MUMPS<br>
>>>>> functionalities to compute a nullbasis.<br>
>>>>> I wrote a code where I compute 4 times the same nullbasis. It does<br>
>>>>> work well when I run it with several procs but with only one<br>
>>>>> processor I get an error on the 2nd iteration when KSPSetUp is<br>
>>>>> called. Furthermore when it is run with a debugger (<br>
>>>>> --with-debugging=yes), it works fine with one or several processors.<br>
>>>>> Have you got any idea about why it doesn't work with one processor<br>
>>>>> and no debugger?<br>
>>>>><br>
>>>>> Thanks.<br>
>>>>> Constantin.<br>
>>>>><br>
>>>>> PS: You can find the code and the files required to run it enclosed.<br>
>>>>><br>
>>>><br>
>><br>
>><br>
<br>
</div></div></blockquote></div><br></div>