[petsc-users] Optimized run crashes on one machine but not another
Matthew Knepley
knepley at gmail.com
Wed Aug 28 15:08:44 CDT 2013
On Wed, Aug 28, 2013 at 3:04 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:
> Hi Matt,
>
> I just built the 3.4.2 release in the hope that it will work. It was
> working fine for the 'next'
> branch until a recent update last night. I updated my laptop/desktop with
> a 1/2 hour
> gap which caused crashes in one build but not in the other. Hence, I moved
> to the
> 3.4.2 release.
>
> I will rebuild using the current 'next' and let you know if there are any
> problems.
>
Can you send configure.log? I built against OpenMPI and it looks like a get
a similar error
which is not there with MPICH. Trying to confirm now.
Matt
> Thanks.
>
> -
> Garnet
>
>
>
> On Wed, Aug 28, 2013 at 12:51 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Wed, Aug 28, 2013 at 1:58 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:
>>
>>> Hi Matt,
>>>
>>> Attached is a folder containing the code and a sample mesh.
>>>
>>
>> I have built and run it here with the 'next' branch from today, and it
>> does not crash.
>> What branch are you using?
>>
>> Matt
>>
>>
>>> Thanks for the help.
>>>
>>> -
>>> Garnet
>>>
>>>
>>> On Wed, Aug 28, 2013 at 11:43 AM, Matthew Knepley <knepley at gmail.com>wrote:
>>>
>>>> On Wed, Aug 28, 2013 at 12:52 PM, Garnet Vaz <garnet.vaz at gmail.com>wrote:
>>>>
>>>>> Thanks Jed. I did as you told and the code finally crashes on both
>>>>> builds. I installed the 3.4.2 release now.
>>>>>
>>>>> The problem now seems to come from DMPlexDistribute(). I have two
>>>>> versions to load the mesh. One creates a mesh using Triangle
>>>>> from PETSc and the other loads a mesh using DMPlexCreateFromCellList().
>>>>>
>>>>> Is the following piece of code for creating a mesh using Triangle
>>>>> right?
>>>>>
>>>>
>>>> Okay, something is really very wrong here. It is calling
>>>> EnlargePartition(), but for
>>>> that path to be taken, you have to trip and earlier exception. It
>>>> should not be possible
>>>> to call it. So I think you have memory corruption somewhere.
>>>>
>>>> Can you send a sample code we can run?
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>>> ierr =
>>>>> DMPlexCreateBoxMesh(comm,2,interpolate,&user->dm);CHKERRQ(ierr);
>>>>> if (user->dm) {
>>>>> DM refinedMesh = NULL;
>>>>> DM distributedMesh = NULL;
>>>>> ierr =
>>>>> DMPlexSetRefinementLimit(user->dm,refinementLimit);CHKERRQ(ierr);
>>>>> ierr =
>>>>> DMRefine(user->dm,PETSC_COMM_WORLD,&refinedMesh);CHKERRQ(ierr);
>>>>> if (refinedMesh) {
>>>>> ierr = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>> user->dm = refinedMesh;
>>>>> }
>>>>> ierr =
>>>>> DMPlexDistribute(user->dm,"chaco",1,&distributedMesh);CHKERRQ(ierr);
>>>>> if (distributedMesh) {
>>>>> ierr = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>> user->dm = distributedMesh;
>>>>> }
>>>>> }
>>>>>
>>>>> Using gdb, the code gives a SEGV during distribution. The backtrace
>>>>> when the fault
>>>>> occurs points to an invalid pointer for ISGetIndices(). Attached is a
>>>>> screenshot of the
>>>>> gdb backtrace.
>>>>> Do I need to set up some index set here?
>>>>>
>>>>> The same error occurs when trying to distribute a mesh using
>>>>> DMPlexCreateFromCellList().
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>>
>>>>> -
>>>>> Garnet
>>>>>
>>>>>
>>>>> On Wed, Aug 28, 2013 at 6:38 AM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>
>>>>>> Garnet Vaz <garnet.vaz at gmail.com> writes:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I just rebuilt PETSc on both my laptop and my desktop.
>>>>>> > On both machines the output of >grep GIT configure.log
>>>>>> > Defined "VERSION_GIT" to
>>>>>> > ""d8f7425765acda418e23a679c25fd616d9da8153""
>>>>>> > Defined "VERSION_DATE_GIT" to ""2013-08-27 10:05:35 -0500""
>>>>>>
>>>>>> Thanks for the report. Matt just merged a bunch of DMPlex-related
>>>>>> branches (about 60 commits in total). Can you 'git pull && make' to
>>>>>> let
>>>>>> us know if the problem is still there? (It may not fix the issue, but
>>>>>> at least we'll be debugging current code.)
>>>>>>
>>>>>> When dealing with debug vs. optimized issues, it's useful to configure
>>>>>> --with-debugging=0 COPTFLAGS='-O2 -g'. This allows valgrind to
>>>>>> include
>>>>>> line numbers, but it (usually!) does not affect whether the error
>>>>>> occurs.
>>>>>>
>>>>>> > My code runs on both machines in the debug build without causing
>>>>>> > any problems. When I try to run the optimized build, the code
>>>>>> crashes
>>>>>> > with a SEGV fault on my laptop but not on the desktop. I have built
>>>>>> > PETSc using the same configure options.
>>>>>> >
>>>>>> > I have attached the outputs of valgrind for both my laptop/desktop
>>>>>> for
>>>>>> > both the debug/opt builds. How can I figure out what differences are
>>>>>> > causing the errors in one case and not the other?
>>>>>>
>>>>>> It looks like an uninitialized variable. Debug mode often ends up
>>>>>> initializing local variables where as optimized leaves junk in them.
>>>>>> Stack allocation alignment/padding is also often different.
>>>>>> Unfortunately, valgrind is less powerful for debugging stack
>>>>>> corruption,
>>>>>> so the uninitialized warning is usually the best you get.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Garnet
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Garnet
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> Regards,
> Garnet
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130828/a208f795/attachment-0001.html>
More information about the petsc-users
mailing list