[petsc-users] Error in creation of MPISBAIJ matrix with DMCreateMatrix

Xiao, Jianjun (IKET) jianjun.xiao at kit.edu
Thu Jan 23 05:55:14 CST 2014


Dear Barry,

There is actually no limitation for me to use the cluster. 

Somehow, the version Openmpi on our Linux cluster is quite old 1.4.3. Then I updated the Openmpi to 1.6.5.  Now the code works fine without any error. 

Ae Jed mentioned, "PETSc does not depend on any specific versions (or implementations) of MPI. http://lists.mcs.anl.gov/pipermail/petsc-users/2011-January/007635.html " 

I was wondering if you think that the MPI version could be the reason of my errors?

Thanks.

Best regards
JJ 
________________________________________
From: Barry Smith [bsmith at mcs.anl.gov]
Sent: Thursday, January 23, 2014 1:28 AM
To: Xiao, Jianjun (IKET)
Cc: petsc-users at mcs.anl.gov; jedbrown at mcs.anl.gov
Subject: Re: [petsc-users] Error in creation of MPISBAIJ matrix with DMCreateMatrix

Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end

   This is a very different problem.  Something on your cluster is deciding to kill your job. Perhaps you are not suppose to be running there? The matrix is very small so it is not likely due to using too much memory. You need to talk directly with the system administrator for that systems and show him/her the message above. It is not a bug in PETSc.


  Barry

I ran with 50-60 processes under valgrind and it ran fine for all cases.


On Jan 22, 2014, at 5:04 PM, Xiao, Jianjun (IKET) <jianjun.xiao at kit.edu> wrote:

> I changed the code as you suggested, and I updated the nightly tarball.
>
> Unfortunately, I still got the error below with number of processors, say 22, 55, 61, 62 ... on the Linux cluster.  And I think the number might be different on your machine. Could you please try other numbers, say 50~63?
>
> It seems the errors could be from my side. Could you please give me some idea on how to debug such a problem?
>
> Thank you.
>
> [60]PETSC ERROR: ------------------------------------------------------------------------
> [60]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
> [60]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [60]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[60]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [60]PETSC ERROR: likely location of problem given in stack below
> [60]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [60]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [60]PETSC ERROR:       INSTEAD the line number of the start of the function
> [60]PETSC ERROR:       is given.
> [60]PETSC ERROR: [60] MatAssemblyBegin_MPISBAIJ line 483 /home/xiao/Local/petsc-dev-debug/src/mat/impls/sbaij/mpi/mpisbaij.c
> [60]PETSC ERROR: [60] MatAssemblyBegin line 4854 /home/xiao/Local/petsc-dev-debug/src/mat/interface/matrix.c
> [60]PETSC ERROR: [60] DMCreateMatrix_DA_3d_MPISBAIJ line 1679 /home/xiao/Local/petsc-dev-debug/src/dm/impls/da/fdda.c
> [60]PETSC ERROR: [60] DMCreateMatrix_DA line 625 /home/xiao/Local/petsc-dev-debug/src/dm/impls/da/fdda.c
> [60]PETSC ERROR: [60] DMCreateMatrix line 961 /home/xiao/Local/petsc-dev-debug/src/dm/interface/dm.c
> [60]PETSC ERROR: --------------------- Error Message ------------------------------------
> [60]PETSC ERROR: Signal received!
> [60]PETSC ERROR: ------------------------------------------------------------------------
> [60]PETSC ERROR: Petsc Development GIT revision: v3.4.3-2332-g54f71ec  GIT Date: 2014-01-20 14:12:11 -0700
> [60]PETSC ERROR: See docs/changes/index.html for recent updates.
> [60]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [60]PETSC ERROR: See docs/index.html for manual pages.
> [60]PETSC ERROR: ------------------------------------------------------------------------
> [60]PETSC ERROR: ./ex44f on a linux-gnu named cluster08 by xiao Wed Jan 22 23:51:36 2014
> [60]PETSC ERROR: Libraries linked from /home/xiao/Local/petsc-dev-debug/linux-gnu/lib
> [60]PETSC ERROR: Configure run at Wed Jan 22 23:45:50 2014
> [60]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-debugging=1
> [60]PETSC ERROR: ------------------------------------------------------------------------
>
> JJ
> _________
> _______________________________
> From: Barry Smith [bsmith at mcs.anl.gov]
> Sent: Wednesday, January 22, 2014 9:29 PM
> To: Xiao, Jianjun (IKET)
> Cc: petsc-users at mcs.anl.gov; jedbrown at mcs.anl.gov
> Subject: Re: [petsc-users] Error in creation of MPISBAIJ matrix with DMCreateMatrix
>
>  There was an error in your code; because you did not use implicit none in the main program it did not tell you
>      CALL DMDACreate3d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,               &
>                                                           1
> Error: Symbol 'dmda_boundary_none' at (1) has no IMPLICIT type
> ex1f.F90:14.26:
>
>     &    DMDA_STENCIL_BOX,-1002,-3,-3,PETSC_DECIDE,PETSC_DECIDE,&
>                          1
> Error: Symbol 'dmda_stencil_box' at (1) has no IMPLICIT type
>
>
>   I fixed the code by adding use petscdmda and the implicit none and it ran without error on those number of processes on both a MacOS and a Linux system
>
>     program main   !   Solves the linear system  J x = f
>      use petscksp; use petscdm; use petscdmda
>      implicit none
> #include <finclude/petscdef.h>
>
>   Please let me know if you still have a problem.
>
>   Barry
>
>
> On Jan 22, 2014, at 5:15 AM, Xiao, Jianjun (IKET) <jianjun.xiao at kit.edu> wrote:
>
>> Dear Barry,
>>
>> I modified ex44f.F90, and ran the case on my 64-processors cluster. Please find the file in the attachment.
>>
>> I tried various number of processors. It seems that for number of processors: 1, 2, 3, 4, 5, 6, 7, 8, 16, 32 and 64 it works fine. For numbers such as 20, 30, 31, 60, 61 and 62, I got the error below. This time, even for MATMPIBAIJ I got error. I did not try all the numbers between 1-64. If you need more information, please let me know.
>>
>> Thank you for your help.
>>
>> Best regards
>> JJ
>>
>>
>> For MATMPISBAIJ, I got the error like this:
>>
>> [55]PETSC ERROR: [55] MatAssemblyBegin_MPISBAIJ line 483 src/mat/impls/sbaij/mpi/mpisbaij.c
>> [55]PETSC ERROR: [55] MatAssemblyBegin line 4865 src/mat/interface/matrix.c
>> [55]PETSC ERROR: [55] DMCreateMatrix_DA_3d_MPISBAIJ line 1694 src/dm/impls/da/fdda.c
>> [55]PETSC ERROR: [55] DMCreateMatrix_DA line 626 src/dm/impls/da/fdda.c
>> [55]PETSC ERROR: [55] DMCreateMatrix line 1002 src/dm/interface/dm.c
>> [55]PETSC ERROR: --------------------- Error Message ------------------------------------
>> [55]PETSC ERROR: Signal received!
>> [55]PETSC ERROR: ------------------------------------------------------------------------
>> [55]PETSC ERROR: Petsc Development GIT revision: f7404d5510646a3c64be49fff6ce547efef07b3d  GIT Date: 2013-11-27 00:12:54 +0100
>> [55]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [55]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [55]PETSC ERROR: See docs/index.html for manual pages.
>> [55]PETSC ERROR: ------------------------------------------------------------------------
>> [55]PETSC ERROR: ./ex44f on a linux-gnu named cluster07 by xiao Wed Jan 22 11:54:32 2014
>> [55]PETSC ERROR: Libraries linked from /home/xiao/Local/petsc-dev-debug/linux-gnu/lib
>> [55]PETSC ERROR: Configure run at Tue Jan 14 15:27:57 2014
>> [55]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-debugging=1
>> [55]PETSC ERROR: ------------------------------------------------------------------------
>> [55]PETSC ERROR: User provided function() line 0 in  unknown file
>>
>>
>> For MATMPIBAIJ, I got the error like this:
>>
>> [44]PETSC ERROR: --------------------- Error Message ------------------------------------
>> [44]PETSC ERROR: Argument out of range!
>> [44]PETSC ERROR: Trying to set preallocation for row 6678 less than first local row 6714!
>> [44]PETSC ERROR: ------------------------------------------------------------------------
>> [44]PETSC ERROR: Petsc Development GIT revision: f7404d5510646a3c64be49fff6ce547efef07b3d  GIT Date: 2013-11-27 00:12:54 +0100
>> [44]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [44]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [44]PETSC ERROR: See docs/index.html for manual pages.
>> [44]PETSC ERROR: ------------------------------------------------------------------------
>> [44]PETSC ERROR: ./ex44f on a linux-gnu named cluster06 by xiao Wed Jan 22 11:52:08 2014
>> [44]PETSC ERROR: Libraries linked from /home/xiao/Local/petsc-dev-debug/linux-gnu/lib
>> [44]PETSC ERROR: Configure run at Tue Jan 14 15:27:57 2014
>> [44]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-debugging=1
>> [44]PETSC ERROR: ------------------------------------------------------------------------
>> [44]PETSC ERROR: DMCreateMatrix_DA_3d_MPIBAIJ() line 1508 in src/dm/impls/da/fdda.c
>> [44]PETSC ERROR: DMCreateMatrix_DA() line 771 in src/dm/impls/da/fdda.c
>> [44]PETSC ERROR: DMCreateMatrix() line 1007 in src/dm/interface/dm.c
>> [44]PETSC ERROR: --------------------- Error Message ------------------------------------
>> [44]PETSC ERROR: Corrupt argument:
>> see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind!
>> [44]PETSC ERROR: Invalid Pointer to Object: Parameter # 1!
>> [44]PETSC ERROR: ------------------------------------------------------------------------
>> [44]PETSC ERROR: Petsc Development GIT revision: f7404d5510646a3c64be49fff6ce547efef07b3d  GIT Date: 2013-11-27 00:12:54 +0100
>> [44]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [44]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [44]PETSC ERROR: See docs/index.html for manual pages.
>> [44]PETSC ERROR: ------------------------------------------------------------------------
>> [44]PETSC ERROR: ./ex44f on a linux-gnu named cluster06 by xiao Wed Jan 22 11:52:08 2014
>> [44]PETSC ERROR: Libraries linked from /home/xiao/Local/petsc-dev-debug/linux-gnu/lib
>> [44]PETSC ERROR: Configure run at Tue Jan 14 15:27:57 2014
>> [44]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-debugging=1
>> [44]PETSC ERROR: ------------------------------------------------------------------------
>> [44]PETSC ERROR: MatDestroy() line 1029 in src/mat/interface/matrix.c
>>
>> ________________________________________
>> From: Barry Smith [bsmith at mcs.anl.gov]
>> Sent: Monday, January 20, 2014 8:37 PM
>> To: Xiao, Jianjun (IKET)
>> Cc: petsc-users at mcs.anl.gov; jedbrown at mcs.anl.gov
>> Subject: Re: [petsc-users] Error in creation of MPISBAIJ matrix with DMCreateMatrix
>>
>>  Thanks for reporting the problem. This is our error. Could you please send us the code that generates the error so we can reproduce the problem, determine the cause and fix it.
>>
>>  Barry
>>
>> We need to know the exact values of imax,jmax, kmax etc to reproduce the problem.
>>
>>
>>
>> On Jan 20, 2014, at 9:33 AM, Xiao, Jianjun (IKET) <jianjun.xiao at kit.edu> wrote:
>>
>>> Dear developers,
>>>
>>> I am using petsc-dev. I tried to create a MPISBAIJ matrix as shown below, and it seems that the matrix creation is sensitive to the number of processors.
>>>
>>>    CALL DMDACreate3d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,            &
>>>   &    DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,                        &
>>>   &    DMDA_STENCIL_BOX,-imax,-jmax,-kmax,PETSC_DECIDE,PETSC_DECIDE,&
>>>   &    PETSC_DECIDE,1,1,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,       &
>>>   &    PETSC_NULL_INTEGER,da,ierr)
>>>
>>>     CALL DMSetMatType(da,MATMPISBAIJ,ierr)
>>>     CALL DMCreateMatrix(da,mat,ierr)
>>>
>>> A cluster with 64 processors was used for the testing.
>>>
>>> When the number of procssors are 1,2,3,4,5,6,7,8,16,32 and 64, the code always works quite well.
>>>
>>> For some other numbers, the code works not so stable. Sometimes, the matrix was created successfully. Sometimes, it failed.
>>>
>>> When the number of procssors are 20, 33, 63 or some relatively bigger numbers , the code always got the error below.
>>>
>>>
>>> mpirun: [25]PETSC ERROR: --------------------- Error Message ------------------------------------
>>> mpirun: [25]PETSC ERROR: Argument out of range!
>>> mpirun: [25]PETSC ERROR: New nonzero at (198,7038) caused a malloc!
>>> mpirun: [25]PETSC ERROR: ------------------------------------------------------------------------
>>> mpirun: [25]PETSC ERROR: Petsc Development GIT revision: f7404d5510646a3c64be49fff6ce547efef07b3d GIT Date: 2013-11-27 00:12:54 +0100
>>> mpirun: [25]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> mpirun: [25]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> mpirun: [25]PETSC ERROR: See docs/index.html for manual pages.
>>> mpirun: [25]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-debugging=1
>>> mpirun: [25]PETSC ERROR: ------------------------------------------------------------------------
>>> mpirun: [25]PETSC ERROR: MatSetValuesBlocked_SeqBAIJ() line 1836 in src/mat/impls/baij/seq/baij.c
>>> mpirun: [25]PETSC ERROR: MatSetValuesBlocked_MPISBAIJ() line 339 in src/mat/impls/sbaij/mpi/mpisbaij.c
>>> mpirun: [25]PETSC ERROR: MatSetValuesBlocked() line 1658 in src/mat/interface/matrix.c
>>> mpirun: [25]PETSC ERROR: DMCreateMatrix_DA_3d_MPISBAIJ() line 1780 in src/dm/impls/da/fdda.c
>>> mpirun: [25]PETSC ERROR: DMCreateMatrix_DA() line 777 in src/dm/impls/da/fdda.c
>>> mpirun: [25]PETSC ERROR: DMCreateMatrix() line 1007 in src/dm/interface/dm.c
>>>
>>> Then I changed the matrix format to MATMPIBAIJ. DMCreateMatrix worked fine for any number of processors.
>>>
>>>     CALL DMSetMatType(da,MATMPIBAIJ,ierr)
>>>     CALL DMCreateMatrix(da,gfmat,ierr)
>>>
>>> Could you please let me know how can I fix this problem? If you need more information, please let me know. Thank you.
>>>
>>> JJ
>>
>> <ex44f.F90>
>
> <ex44f.F90>



More information about the petsc-users mailing list