[petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix

Kong, Fande fande.kong at inl.gov
Mon Oct 24 09:24:10 CDT 2016


On Mon, Oct 24, 2016 at 8:07 AM, Kong, Fande <fande.kong at inl.gov> wrote:

>
>
> On Sun, Oct 23, 2016 at 3:56 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>    Thanks Satish,
>>
>>       I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant
>> (in next for testing)
>>
>>     Fande,
>>
>>         This will also make MatMPIAIJSetPreallocation() work properly
>> with multiple calls (you will not need a MatReset()).
>>
>

Does this work for MPIAIJ only? There are also other functions:
MatSeqAIJSetPreallocation(), MatMPIAIJSetPreallocation(),
MatSeqBAIJSetPreallocation(), MatMPIBAIJSetPreallocation(),
MatSeqSBAIJSetPreallocation(), MatMPISBAIJSetPreallocation(), and
MatXAIJSetPreallocation.

We have to use different function for different type. Could we have an
unified-interface for all of them?

Fande,


>
>>    Barry
>>
>
> Thanks, Barry.
>
> Fande,
>
>
>>
>>
>> > On Oct 21, 2016, at 6:48 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>> >
>> > On Fri, 21 Oct 2016, Barry Smith wrote:
>> >
>> >>
>> >>  valgrind first
>> >
>> > balay at asterix /home/balay/download-pine/x/superlu_dist_test
>> > $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small
>> > First MatLoad!
>> > Mat Object: 2 MPI processes
>> >  type: mpiaij
>> > row 0: (0, 4.)  (1, -1.)  (6, -1.)
>> > row 1: (0, -1.)  (1, 4.)  (2, -1.)  (7, -1.)
>> > row 2: (1, -1.)  (2, 4.)  (3, -1.)  (8, -1.)
>> > row 3: (2, -1.)  (3, 4.)  (4, -1.)  (9, -1.)
>> > row 4: (3, -1.)  (4, 4.)  (5, -1.)  (10, -1.)
>> > row 5: (4, -1.)  (5, 4.)  (11, -1.)
>> > row 6: (0, -1.)  (6, 4.)  (7, -1.)  (12, -1.)
>> > row 7: (1, -1.)  (6, -1.)  (7, 4.)  (8, -1.)  (13, -1.)
>> > row 8: (2, -1.)  (7, -1.)  (8, 4.)  (9, -1.)  (14, -1.)
>> > row 9: (3, -1.)  (8, -1.)  (9, 4.)  (10, -1.)  (15, -1.)
>> > row 10: (4, -1.)  (9, -1.)  (10, 4.)  (11, -1.)  (16, -1.)
>> > row 11: (5, -1.)  (10, -1.)  (11, 4.)  (17, -1.)
>> > row 12: (6, -1.)  (12, 4.)  (13, -1.)  (18, -1.)
>> > row 13: (7, -1.)  (12, -1.)  (13, 4.)  (14, -1.)  (19, -1.)
>> > row 14: (8, -1.)  (13, -1.)  (14, 4.)  (15, -1.)  (20, -1.)
>> > row 15: (9, -1.)  (14, -1.)  (15, 4.)  (16, -1.)  (21, -1.)
>> > row 16: (10, -1.)  (15, -1.)  (16, 4.)  (17, -1.)  (22, -1.)
>> > row 17: (11, -1.)  (16, -1.)  (17, 4.)  (23, -1.)
>> > row 18: (12, -1.)  (18, 4.)  (19, -1.)  (24, -1.)
>> > row 19: (13, -1.)  (18, -1.)  (19, 4.)  (20, -1.)  (25, -1.)
>> > row 20: (14, -1.)  (19, -1.)  (20, 4.)  (21, -1.)  (26, -1.)
>> > row 21: (15, -1.)  (20, -1.)  (21, 4.)  (22, -1.)  (27, -1.)
>> > row 22: (16, -1.)  (21, -1.)  (22, 4.)  (23, -1.)  (28, -1.)
>> > row 23: (17, -1.)  (22, -1.)  (23, 4.)  (29, -1.)
>> > row 24: (18, -1.)  (24, 4.)  (25, -1.)  (30, -1.)
>> > row 25: (19, -1.)  (24, -1.)  (25, 4.)  (26, -1.)  (31, -1.)
>> > row 26: (20, -1.)  (25, -1.)  (26, 4.)  (27, -1.)  (32, -1.)
>> > row 27: (21, -1.)  (26, -1.)  (27, 4.)  (28, -1.)  (33, -1.)
>> > row 28: (22, -1.)  (27, -1.)  (28, 4.)  (29, -1.)  (34, -1.)
>> > row 29: (23, -1.)  (28, -1.)  (29, 4.)  (35, -1.)
>> > row 30: (24, -1.)  (30, 4.)  (31, -1.)
>> > row 31: (25, -1.)  (30, -1.)  (31, 4.)  (32, -1.)
>> > row 32: (26, -1.)  (31, -1.)  (32, 4.)  (33, -1.)
>> > row 33: (27, -1.)  (32, -1.)  (33, 4.)  (34, -1.)
>> > row 34: (28, -1.)  (33, -1.)  (34, 4.)  (35, -1.)
>> > row 35: (29, -1.)  (34, -1.)  (35, 4.)
>> > Second MatLoad!
>> > Mat Object: 2 MPI processes
>> >  type: mpiaij
>> > ==4592== Invalid read of size 4
>> > ==4592==    at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket
>> (mpiaij.c:1402)
>> > ==4592==    by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440)
>> > ==4592==    by 0x53373D7: MatView (matrix.c:989)
>> > ==4592==    by 0x40107E: main (ex16.c:30)
>> > ==4592==  Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd
>> > ==4592==    at 0x4C2FF83: memalign (vg_replace_malloc.c:858)
>> > ==4592==    by 0x4FD121A: PetscMallocAlign (mal.c:28)
>> > ==4592==    by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41)
>> > ==4592==    by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747)
>> > ==4592==    by 0x536B299: MatAssemblyEnd (matrix.c:5298)
>> > ==4592==    by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032)
>> > ==4592==    by 0x5337FEA: MatLoad (matrix.c:1101)
>> > ==4592==    by 0x400D9F: main (ex16.c:22)
>> > ==4592==
>> > ==4591== Invalid read of size 4
>> > ==4591==    at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket
>> (mpiaij.c:1402)
>> > ==4591==    by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440)
>> > ==4591==    by 0x53373D7: MatView (matrix.c:989)
>> > ==4591==    by 0x40107E: main (ex16.c:30)
>> > ==4591==  Address 0xa482958 is 24 bytes before a block of size 7 alloc'd
>> > ==4591==    at 0x4C2FF83: memalign (vg_replace_malloc.c:858)
>> > ==4591==    by 0x4FD121A: PetscMallocAlign (mal.c:28)
>> > ==4591==    by 0x4F31FB5: PetscStrallocpy (str.c:197)
>> > ==4591==    by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253)
>> > ==4591==    by 0x4EF96E2: PetscClassIdRegister (plog.c:2053)
>> > ==4591==    by 0x51FA018: VecInitializePackage (dlregisvec.c:165)
>> > ==4591==    by 0x51F6DE9: VecCreate (veccreate.c:35)
>> > ==4591==    by 0x51C49F0: VecCreateSeq (vseqcr.c:37)
>> > ==4591==    by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104)
>> > ==4591==    by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747)
>> > ==4591==    by 0x536B299: MatAssemblyEnd (matrix.c:5298)
>> > ==4591==    by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032)
>> > ==4591==    by 0x5337FEA: MatLoad (matrix.c:1101)
>> > ==4591==    by 0x400D9F: main (ex16.c:22)
>> > ==4591==
>> > [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> > [0]PETSC ERROR: Argument out of range
>> > [0]PETSC ERROR: Column too large: col 96 max 35
>> > [0]PETSC ERROR: See https://urldefense.proofpoint.
>> com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_documentation_fa
>> q.html&d=CwIFAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__
>> aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&
>> m=yCFQeqGFVZhJtXzPwmjejP5oiMeddVxB4a_mxWbQYkA&s=lWoiLmjuyX1M
>> 9FCbfQAwkLK2cAGeDvnXO-fMCKllDTE&e=  for trouble shooting.
>> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23
>> GIT Date: 2016-10-20 22:22:58 +0000
>> > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri
>> Oct 21 18:47:51 2016
>> > [0]PETSC ERROR: Configure options --download-metis --download-parmetis
>> --download-superlu_dist PETSC_ARCH=arch-idx64-slu
>> > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in
>> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c
>> > [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in
>> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c
>> > [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in
>> /home/balay/petsc/src/mat/interface/matrix.c
>> > [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in
>> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c
>> > [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in
>> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c
>> > [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/inte
>> rface/matrix.c
>> > [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/su
>> perlu_dist_test/ex16.c
>> > [0]PETSC ERROR: PETSc Option Table entries:
>> > [0]PETSC ERROR: -display :0.0
>> > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small
>> > [0]PETSC ERROR: -malloc_dump
>> > [0]PETSC ERROR: ----------------End of Error Message -------send entire
>> error message to petsc-maint at mcs.anl.gov----------
>> > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0
>> > [cli_0]: aborting job:
>> > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0
>> > ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are
>> definitely lost in loss record 1,014 of 1,016
>> > ==4591==    at 0x4C2FF83: memalign (vg_replace_malloc.c:858)
>> > ==4591==    by 0x4FD121A: PetscMallocAlign (mal.c:28)
>> > ==4591==    by 0x52F3B14: MatCreate (gcreate.c:84)
>> > ==4591==    by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket
>> (mpiaij.c:1371)
>> > ==4591==    by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440)
>> > ==4591==    by 0x53373D7: MatView (matrix.c:989)
>> > ==4591==    by 0x40107E: main (ex16.c:30)
>> > ==4591==
>> >
>> > ============================================================
>> =======================
>> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > =   PID 4591 RUNNING AT asterix
>> > =   EXIT CODE: 63
>> > =   CLEANING UP REMAINING PROCESSES
>> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > ============================================================
>> =======================
>> > balay at asterix /home/balay/download-pine/x/superlu_dist_test
>> > $
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161024/33d43311/attachment.html>


More information about the petsc-users mailing list