[petsc-users] Crash caused by strange error in KSPSetUp

Junchao Zhang jczhang at mcs.anl.gov
Sun Feb 16 23:37:02 CST 2020


Richard,
  I managed to get the code Simlul at trophy built. Could you tell me how to
run your test? I want to see if I can reproduce the error. Thanks

--Junchao Zhang


On Fri, Feb 14, 2020 at 8:34 PM Richard Beare <richard.beare at monash.edu>
wrote:

> It doesn't compile out of the box with master.
>
> singularity def file attached.
>
> On Sat, 15 Feb 2020 at 08:03, Richard Beare <richard.beare at monash.edu>
> wrote:
>
>> I will see if I can build with master. The docs for simulatrophy say
>> 3.6.3.1.
>>
>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
>>
>>> Which petsc version do you use? In aij.c of the master branch, I saw
>>> Barry recently added a useful check to catch number of nonzero overflow,
>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using
>>> 64-bit indices did not solve the problem, it might not be the reason.  You
>>> should try the master branch if feasible. Also, vary number of MPI ranks to
>>> see if error stack changes.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> No luck - exactly the same error after including the
>>>> --with-64-bit-indicies=yes --download-mpich=yes options
>>>>
>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>>>> negative) value: -17152036540
>>>> ==8674==    at 0x4C320A6: memalign (in
>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>>>> const*, char const*, void**) (mal.c:28)
>>>> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
>>>> const*, char const*, void**) (mtr.c:188)
>>>> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>>>> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>>>> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>>>> (fdda.c:1085)
>>>> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>> (fdda.c:759)
>>>> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>>>> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
>>>> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>>>> (PetscAdLemTaras3D.hxx:255)
>>>> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>> (AdLem3D.hxx:551)
>>>> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
>>>> ==8674==
>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>>
>>>>>    Richard,
>>>>>
>>>>>      It is likely that for these problems some of the integers become
>>>>> too large for the int variable to hold them, thus they overflow and become
>>>>> negative.
>>>>>
>>>>>      You should make a new PETSC_ARCH configuration of PETSc that uses
>>>>> the configure option --with-64-bit-indices, this will change PETSc to use
>>>>> 64 bit integers which will not overflow.
>>>>>
>>>>>      Good luck and let us know how it works out
>>>>>
>>>>>     Barry
>>>>>
>>>>>      Probably the code is built with an older version of PETSc; the
>>>>> later versions should produce a more useful error message.
>>>>>
>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>> >
>>>>> > Hi Everyone,
>>>>> > I am experimenting with the Simlul at trophy tool (
>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>>>>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>>>>> have this running on most of a dataset of about 50 scans, but experience
>>>>> crashes with several that I am trying to track down. However I am out of
>>>>> ideas. The problem images are slightly bigger than some of the successful
>>>>> ones, but not substantially so, and I have experimented on machines with
>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>>>> during the setup process, but the array sizes that I can track (which
>>>>> derive from image sizes), appear correct at every point I can check. Any
>>>>> suggestions as to how I can check what might go wrong in the setup of the
>>>>> ksp object?
>>>>> > Thankyou.
>>>>> >
>>>>> > valgrind tells me:
>>>>> >
>>>>> > ==18175== Argument 'size' of function memalign has a fishy (possibly
>>>>> negative) value: -17152038144
>>>>> > ==18175==    at 0x4C320A6: memalign (in
>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
>>>>> const*, char const*, void**) (mal.c:28)
>>>>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>>>> (aij.c:3595)
>>>>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>> _p_Mat*) (fdda.c:1085)
>>>>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>> (fdda.c:759)
>>>>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>>>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>>>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>>>> (PetscAdLemTaras3D.hxx:269)
>>>>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>>> (AdLem3D.hxx:552)
>>>>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>>>> > ==18175==
>>>>> >
>>>>> > --
>>>>> > --
>>>>> > A/Prof Richard Beare
>>>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>>>> > orcid.org/0000-0002-7530-5664
>>>>> > Richard.Beare at monash.edu
>>>>> > +61 3 9788 1724
>>>>> >
>>>>> >
>>>>> >
>>>>> > Geospatial Research:
>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>
>>>>>
>>>>
>>>> --
>>>> --
>>>> A/Prof Richard Beare
>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>> orcid.org/0000-0002-7530-5664
>>>> Richard.Beare at monash.edu
>>>> +61 3 9788 1724
>>>>
>>>>
>>>>
>>>> Geospatial Research:
>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>
>>>
>>
>> --
>> --
>> A/Prof Richard Beare
>> Imaging and Bioinformatics, Peninsula Clinical School
>> orcid.org/0000-0002-7530-5664
>> Richard.Beare at monash.edu
>> +61 3 9788 1724
>>
>>
>>
>> Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200216/c44c97ff/attachment.html>


More information about the petsc-users mailing list