[petsc-users] superlu_dist segfault

Zhang, Hong hzhang at mcs.anl.gov
Wed Oct 28 20:10:38 CDT 2020


Marius,
I tested your code with petsc-release on my mac laptop using np=2 cores. I first tested a small matrix data file successfully. Then I switch to your data file and run out of memory, likely due to the dense matrices B and X. I got an error "Your system has run out of application memory" from my laptop.

The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B and X with the same size -- a huge memory requirement!
By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the code run well with np=2. Note the error message you got
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range

The modified code I used is attached.
Hong
________________________________
From: Marius Buerkle <mbuerkle at web.de>
Sent: Tuesday, October 27, 2020 10:01 PM
To: Zhang, Hong <hzhang at mcs.anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Sherry Li <xiaoye at nersc.gov>
Subject: Aw: Re: [petsc-users] superlu_dist segfault

Hi,

I recompiled PETSC with debug option, now I get a seg fault at a different position

[23]PETSC ERROR: ------------------------------------------------------------------------
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[23]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[23]PETSC ERROR: likely location of problem given in stack below
[23]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[23]PETSC ERROR:       INSTEAD the line number of the start of the function
[23]PETSC ERROR:       is given.
[23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[23]PETSC ERROR: [23] MatMatSolve line 3466 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c
[23]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[23]PETSC ERROR: Signal received

I  made a small reproducer. The matrix is a bit too big so I cannot attach it directly to the email, but I put it in the cloud
https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw

Best,
Marius


Gesendet: Dienstag, 27. Oktober 2020 um 23:11 Uhr
Von: "Zhang, Hong" <hzhang at mcs.anl.gov>
An: "Marius Buerkle" <mbuerkle at web.de>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>, "Sherry Li" <xiaoye at nersc.gov>
Betreff: Re: [petsc-users] superlu_dist segfault
Marius,
It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
    if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");

We do not know what it means. You may use a debugger to check the values of the variables involved.
I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation.
Hong


________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Marius Buerkle <mbuerkle at web.de>
Sent: Tuesday, October 27, 2020 8:46 AM
To: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: [petsc-users] superlu_dist segfault

Hi,

When using MatMatSolve with superlu_dist I get a segmentation fault:

Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c

The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think.

Best,
Marius
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/1ec62bfd/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: superlu_test.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/1ec62bfd/attachment.c>


More information about the petsc-users mailing list