[petsc-users] Crash caused by strange error in KSPSetUp
Richard Beare
richard.beare at monash.edu
Mon Feb 17 14:16:04 CST 2020
Awesome - thanks for that. I will check it out. I will also look at what
needs to be done to bring simulatrophy to a more recent version of petsc.
On Tue, 18 Feb 2020 at 03:19, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
> Hi, Richard,
> I tested the case you sent over and found it did fail due to the 32-bit
> overflow on number of non-zeros, and with a 64-bit built petsc it passed.
> You had a typo when you reported that --with-64-bit-indicies=yes failed. It
> should be --with-64-bit-indices=yes.
> You can go with a 64-bit built petsc, or you can go with parallel
> computing and run with multiple MPI ranks so that each rank has less
> non-zeros and it is faster (but you need to make sure that code is
> correctly parallelized).
> Barry's recent fix ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); would
> print more useful error messages in this case. Barry, should we patch it
> back to 3.6.3?
>
> --Junchao Zhang
>
>
> On Sun, Feb 16, 2020 at 11:37 PM Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> Richard,
>> I managed to get the code Simlul at trophy built. Could you tell me how
>> to run your test? I want to see if I can reproduce the error. Thanks
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Feb 14, 2020 at 8:34 PM Richard Beare <richard.beare at monash.edu>
>> wrote:
>>
>>> It doesn't compile out of the box with master.
>>>
>>> singularity def file attached.
>>>
>>> On Sat, 15 Feb 2020 at 08:03, Richard Beare <richard.beare at monash.edu>
>>> wrote:
>>>
>>>> I will see if I can build with master. The docs for simulatrophy say
>>>> 3.6.3.1.
>>>>
>>>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> Which petsc version do you use? In aij.c of the master branch, I saw
>>>>> Barry recently added a useful check to catch number of nonzero overflow,
>>>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using
>>>>> 64-bit indices did not solve the problem, it might not be the reason. You
>>>>> should try the master branch if feasible. Also, vary number of MPI ranks to
>>>>> see if error stack changes.
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> No luck - exactly the same error after including the
>>>>>> --with-64-bit-indicies=yes --download-mpich=yes options
>>>>>>
>>>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>>>>>> negative) value: -17152036540
>>>>>> ==8674== at 0x4C320A6: memalign (in
>>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>> ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>>>>>> const*, char const*, void**) (mal.c:28)
>>>>>> ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int,
>>>>>> char const*, char const*, void**) (mtr.c:188)
>>>>>> ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ
>>>>>> (aij.c:3595)
>>>>>> ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>>> ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>>> _p_Mat*) (fdda.c:1085)
>>>>>> ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>>> (fdda.c:759)
>>>>>> ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>>>>>> ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262)
>>>>>> ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>>>>>> (PetscAdLemTaras3D.hxx:255)
>>>>>> ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>>>> (AdLem3D.hxx:551)
>>>>>> ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344)
>>>>>> ==8674==
>>>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Richard,
>>>>>>>
>>>>>>> It is likely that for these problems some of the integers
>>>>>>> become too large for the int variable to hold them, thus they overflow and
>>>>>>> become negative.
>>>>>>>
>>>>>>> You should make a new PETSC_ARCH configuration of PETSc that
>>>>>>> uses the configure option --with-64-bit-indices, this will change PETSc to
>>>>>>> use 64 bit integers which will not overflow.
>>>>>>>
>>>>>>> Good luck and let us know how it works out
>>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>> Probably the code is built with an older version of PETSc; the
>>>>>>> later versions should produce a more useful error message.
>>>>>>>
>>>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>> >
>>>>>>> > Hi Everyone,
>>>>>>> > I am experimenting with the Simlul at trophy tool (
>>>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc
>>>>>>> to simulate brain atrophy based on segmented MRI data. I am not the author.
>>>>>>> I have this running on most of a dataset of about 50 scans, but experience
>>>>>>> crashes with several that I am trying to track down. However I am out of
>>>>>>> ideas. The problem images are slightly bigger than some of the successful
>>>>>>> ones, but not substantially so, and I have experimented on machines with
>>>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>>>>>> during the setup process, but the array sizes that I can track (which
>>>>>>> derive from image sizes), appear correct at every point I can check. Any
>>>>>>> suggestions as to how I can check what might go wrong in the setup of the
>>>>>>> ksp object?
>>>>>>> > Thankyou.
>>>>>>> >
>>>>>>> > valgrind tells me:
>>>>>>> >
>>>>>>> > ==18175== Argument 'size' of function memalign has a fishy
>>>>>>> (possibly negative) value: -17152038144
>>>>>>> > ==18175== at 0x4C320A6: memalign (in
>>>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>>> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int,
>>>>>>> char const*, char const*, void**) (mal.c:28)
>>>>>>> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>>>>>> (aij.c:3595)
>>>>>>> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>>>> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>>>> _p_Mat*) (fdda.c:1085)
>>>>>>> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>>>> (fdda.c:759)
>>>>>>> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>>>>>> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>>>>>> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>>>>>> (PetscAdLemTaras3D.hxx:269)
>>>>>>> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool,
>>>>>>> bool) (AdLem3D.hxx:552)
>>>>>>> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>>>>>> > ==18175==
>>>>>>> >
>>>>>>> > --
>>>>>>> > --
>>>>>>> > A/Prof Richard Beare
>>>>>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>>>>>> > orcid.org/0000-0002-7530-5664
>>>>>>> > Richard.Beare at monash.edu
>>>>>>> > +61 3 9788 1724
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Geospatial Research:
>>>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> A/Prof Richard Beare
>>>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>>>> orcid.org/0000-0002-7530-5664
>>>>>> Richard.Beare at monash.edu
>>>>>> +61 3 9788 1724
>>>>>>
>>>>>>
>>>>>>
>>>>>> Geospatial Research:
>>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>>
>>>>>
>>>>
>>>> --
>>>> --
>>>> A/Prof Richard Beare
>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>> orcid.org/0000-0002-7530-5664
>>>> Richard.Beare at monash.edu
>>>> +61 3 9788 1724
>>>>
>>>>
>>>>
>>>> Geospatial Research:
>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>
>>>
>>>
>>> --
>>> --
>>> A/Prof Richard Beare
>>> Imaging and Bioinformatics, Peninsula Clinical School
>>> orcid.org/0000-0002-7530-5664
>>> Richard.Beare at monash.edu
>>> +61 3 9788 1724
>>>
>>>
>>>
>>> Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>
--
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724
Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/648c10b2/attachment.html>
More information about the petsc-users
mailing list