[petsc-users] Optimized run crashes on one machine but not another
Matthew Knepley
knepley at gmail.com
Wed Aug 28 16:02:31 CDT 2013
On Wed, Aug 28, 2013 at 3:32 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:
> Hi Matt,
>
> I just ran git clone https://bitbucket.org/petsc/petsc and built
> the debug build. The code still crashes now with a slightly
> different back trace. It looks like a request for a large (wrong)
> amount of memory which could be from some unitialized value
> I have lying about. I will look into this some more.
>
It would really help if you could track this down in the debugger. I am not
getting
that here. You would think I would get an unititialized report from the
compiler.
Thanks,
Matt
> Attached is the configure.log file for my current build.
>
> -
> Garnet
>
>
>
> On Wed, Aug 28, 2013 at 1:08 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Wed, Aug 28, 2013 at 3:04 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:
>>
>>> Hi Matt,
>>>
>>> I just built the 3.4.2 release in the hope that it will work. It was
>>> working fine for the 'next'
>>> branch until a recent update last night. I updated my laptop/desktop
>>> with a 1/2 hour
>>> gap which caused crashes in one build but not in the other. Hence, I
>>> moved to the
>>> 3.4.2 release.
>>>
>>> I will rebuild using the current 'next' and let you know if there are
>>> any problems.
>>>
>>
>> Can you send configure.log? I built against OpenMPI and it looks like a
>> get a similar error
>> which is not there with MPICH. Trying to confirm now.
>>
>> Matt
>>
>>
>>> Thanks.
>>>
>>> -
>>> Garnet
>>>
>>>
>>>
>>> On Wed, Aug 28, 2013 at 12:51 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>
>>>> On Wed, Aug 28, 2013 at 1:58 PM, Garnet Vaz <garnet.vaz at gmail.com>wrote:
>>>>
>>>>> Hi Matt,
>>>>>
>>>>> Attached is a folder containing the code and a sample mesh.
>>>>>
>>>>
>>>> I have built and run it here with the 'next' branch from today, and it
>>>> does not crash.
>>>> What branch are you using?
>>>>
>>>> Matt
>>>>
>>>>
>>>>> Thanks for the help.
>>>>>
>>>>> -
>>>>> Garnet
>>>>>
>>>>>
>>>>> On Wed, Aug 28, 2013 at 11:43 AM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>>
>>>>>> On Wed, Aug 28, 2013 at 12:52 PM, Garnet Vaz <garnet.vaz at gmail.com>wrote:
>>>>>>
>>>>>>> Thanks Jed. I did as you told and the code finally crashes on both
>>>>>>> builds. I installed the 3.4.2 release now.
>>>>>>>
>>>>>>> The problem now seems to come from DMPlexDistribute(). I have two
>>>>>>> versions to load the mesh. One creates a mesh using Triangle
>>>>>>> from PETSc and the other loads a mesh using
>>>>>>> DMPlexCreateFromCellList().
>>>>>>>
>>>>>>> Is the following piece of code for creating a mesh using Triangle
>>>>>>> right?
>>>>>>>
>>>>>>
>>>>>> Okay, something is really very wrong here. It is calling
>>>>>> EnlargePartition(), but for
>>>>>> that path to be taken, you have to trip and earlier exception. It
>>>>>> should not be possible
>>>>>> to call it. So I think you have memory corruption somewhere.
>>>>>>
>>>>>> Can you send a sample code we can run?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>> ierr =
>>>>>>> DMPlexCreateBoxMesh(comm,2,interpolate,&user->dm);CHKERRQ(ierr);
>>>>>>> if (user->dm) {
>>>>>>> DM refinedMesh = NULL;
>>>>>>> DM distributedMesh = NULL;
>>>>>>> ierr =
>>>>>>> DMPlexSetRefinementLimit(user->dm,refinementLimit);CHKERRQ(ierr);
>>>>>>> ierr =
>>>>>>> DMRefine(user->dm,PETSC_COMM_WORLD,&refinedMesh);CHKERRQ(ierr);
>>>>>>> if (refinedMesh) {
>>>>>>> ierr = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>>>> user->dm = refinedMesh;
>>>>>>> }
>>>>>>> ierr =
>>>>>>> DMPlexDistribute(user->dm,"chaco",1,&distributedMesh);CHKERRQ(ierr);
>>>>>>> if (distributedMesh) {
>>>>>>> ierr = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>>>> user->dm = distributedMesh;
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> Using gdb, the code gives a SEGV during distribution. The backtrace
>>>>>>> when the fault
>>>>>>> occurs points to an invalid pointer for ISGetIndices(). Attached is
>>>>>>> a screenshot of the
>>>>>>> gdb backtrace.
>>>>>>> Do I need to set up some index set here?
>>>>>>>
>>>>>>> The same error occurs when trying to distribute a mesh using
>>>>>>> DMPlexCreateFromCellList().
>>>>>>>
>>>>>>> Thanks for the help.
>>>>>>>
>>>>>>>
>>>>>>> -
>>>>>>> Garnet
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 28, 2013 at 6:38 AM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>>>
>>>>>>>> Garnet Vaz <garnet.vaz at gmail.com> writes:
>>>>>>>>
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I just rebuilt PETSc on both my laptop and my desktop.
>>>>>>>> > On both machines the output of >grep GIT configure.log
>>>>>>>> > Defined "VERSION_GIT" to
>>>>>>>> > ""d8f7425765acda418e23a679c25fd616d9da8153""
>>>>>>>> > Defined "VERSION_DATE_GIT" to ""2013-08-27 10:05:35
>>>>>>>> -0500""
>>>>>>>>
>>>>>>>> Thanks for the report. Matt just merged a bunch of DMPlex-related
>>>>>>>> branches (about 60 commits in total). Can you 'git pull && make'
>>>>>>>> to let
>>>>>>>> us know if the problem is still there? (It may not fix the issue,
>>>>>>>> but
>>>>>>>> at least we'll be debugging current code.)
>>>>>>>>
>>>>>>>> When dealing with debug vs. optimized issues, it's useful to
>>>>>>>> configure
>>>>>>>> --with-debugging=0 COPTFLAGS='-O2 -g'. This allows valgrind to
>>>>>>>> include
>>>>>>>> line numbers, but it (usually!) does not affect whether the error
>>>>>>>> occurs.
>>>>>>>>
>>>>>>>> > My code runs on both machines in the debug build without causing
>>>>>>>> > any problems. When I try to run the optimized build, the code
>>>>>>>> crashes
>>>>>>>> > with a SEGV fault on my laptop but not on the desktop. I have
>>>>>>>> built
>>>>>>>> > PETSc using the same configure options.
>>>>>>>> >
>>>>>>>> > I have attached the outputs of valgrind for both my
>>>>>>>> laptop/desktop for
>>>>>>>> > both the debug/opt builds. How can I figure out what differences
>>>>>>>> are
>>>>>>>> > causing the errors in one case and not the other?
>>>>>>>>
>>>>>>>> It looks like an uninitialized variable. Debug mode often ends up
>>>>>>>> initializing local variables where as optimized leaves junk in them.
>>>>>>>> Stack allocation alignment/padding is also often different.
>>>>>>>> Unfortunately, valgrind is less powerful for debugging stack
>>>>>>>> corruption,
>>>>>>>> so the uninitialized warning is usually the best you get.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Garnet
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Garnet
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Garnet
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> Regards,
> Garnet
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130828/91fd430f/attachment.html>
More information about the petsc-users
mailing list