[petsc-users] Crash caused by strange error in KSPSetUp

Junchao Zhang jczhang at mcs.anl.gov
Fri Feb 14 09:47:04 CST 2020


Which petsc version do you use? In aij.c of the master branch, I saw Barry
recently added a useful check to catch number of nonzero overflow, ierr =
PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using 64-bit
indices did not solve the problem, it might not be the reason.  You should
try the master branch if feasible. Also, vary number of MPI ranks to see if
error stack changes.

--Junchao Zhang


On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> No luck - exactly the same error after including the
> --with-64-bit-indicies=yes --download-mpich=yes options
>
> ==8674== Argument 'size' of function memalign has a fishy (possibly
> negative) value: -17152036540
> ==8674==    at 0x4C320A6: memalign (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
> const*, char const*, void**) (mal.c:28)
> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
> const*, char const*, void**) (mtr.c:188)
> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
> (fdda.c:1085)
> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759)
> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
> (PetscAdLemTaras3D.hxx:255)
> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
> (AdLem3D.hxx:551)
> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
> ==8674==
> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
>>
>>    Richard,
>>
>>      It is likely that for these problems some of the integers become too
>> large for the int variable to hold them, thus they overflow and become
>> negative.
>>
>>      You should make a new PETSC_ARCH configuration of PETSc that uses
>> the configure option --with-64-bit-indices, this will change PETSc to use
>> 64 bit integers which will not overflow.
>>
>>      Good luck and let us know how it works out
>>
>>     Barry
>>
>>      Probably the code is built with an older version of PETSc; the later
>> versions should produce a more useful error message.
>>
>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> > Hi Everyone,
>> > I am experimenting with the Simlul at trophy tool (
>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>> have this running on most of a dataset of about 50 scans, but experience
>> crashes with several that I am trying to track down. However I am out of
>> ideas. The problem images are slightly bigger than some of the successful
>> ones, but not substantially so, and I have experimented on machines with
>> sufficient RAM. The error happens very quickly, as part of setup - see the
>> valgrind report below. I haven't managed to get the sgcheck tool to work
>> yet. I can only guess that the ksp object is somehow becoming corrupted
>> during the setup process, but the array sizes that I can track (which
>> derive from image sizes), appear correct at every point I can check. Any
>> suggestions as to how I can check what might go wrong in the setup of the
>> ksp object?
>> > Thankyou.
>> >
>> > valgrind tells me:
>> >
>> > ==18175== Argument 'size' of function memalign has a fishy (possibly
>> negative) value: -17152038144
>> > ==18175==    at 0x4C320A6: memalign (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
>> const*, char const*, void**) (mal.c:28)
>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>> (fdda.c:1085)
>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>> (fdda.c:759)
>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>> (PetscAdLemTaras3D.hxx:269)
>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>> (AdLem3D.hxx:552)
>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>> > ==18175==
>> >
>> > --
>> > --
>> > A/Prof Richard Beare
>> > Imaging and Bioinformatics, Peninsula Clinical School
>> > orcid.org/0000-0002-7530-5664
>> > Richard.Beare at monash.edu
>> > +61 3 9788 1724
>> >
>> >
>> >
>> > Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200214/341622b2/attachment.html>


More information about the petsc-users mailing list