[petsc-users] Optimized run crashes on one machine but not another

Garnet Vaz garnet.vaz at gmail.com
Wed Aug 28 15:04:05 CDT 2013


Hi Matt,

I just built the 3.4.2 release in the hope that it will work. It was
working fine for the 'next'
branch until a recent update last night. I updated my laptop/desktop with a
1/2 hour
gap which caused crashes in one build but not in the other. Hence, I moved
to the
3.4.2 release.

I will rebuild using the current 'next' and let you know if there are any
problems.


Thanks.

-
Garnet



On Wed, Aug 28, 2013 at 12:51 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Aug 28, 2013 at 1:58 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:
>
>> Hi Matt,
>>
>> Attached is a folder containing the code and a sample mesh.
>>
>
> I have built and run it here with the 'next' branch from today, and it
> does not crash.
> What branch are you using?
>
>     Matt
>
>
>> Thanks for the help.
>>
>> -
>> Garnet
>>
>>
>> On Wed, Aug 28, 2013 at 11:43 AM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Wed, Aug 28, 2013 at 12:52 PM, Garnet Vaz <garnet.vaz at gmail.com>wrote:
>>>
>>>> Thanks Jed. I did as you told and the code finally crashes on both
>>>> builds. I installed the 3.4.2 release now.
>>>>
>>>> The problem now seems to come from DMPlexDistribute(). I have two
>>>> versions to load the mesh. One creates a mesh using Triangle
>>>> from PETSc and the other loads a mesh using DMPlexCreateFromCellList().
>>>>
>>>> Is the following piece of code for creating a mesh using Triangle right?
>>>>
>>>
>>> Okay, something is really very wrong here. It is calling
>>> EnlargePartition(), but for
>>> that path to be taken, you have to trip and earlier exception. It should
>>> not be possible
>>> to call it. So I think you have memory corruption somewhere.
>>>
>>> Can you send a sample code we can run?
>>>
>>>   Thanks,
>>>
>>>       Matt
>>>
>>>
>>>>   ierr =
>>>> DMPlexCreateBoxMesh(comm,2,interpolate,&user->dm);CHKERRQ(ierr);
>>>>   if (user->dm) {
>>>>     DM        refinedMesh     = NULL;
>>>>     DM        distributedMesh = NULL;
>>>>     ierr =
>>>> DMPlexSetRefinementLimit(user->dm,refinementLimit);CHKERRQ(ierr);
>>>>     ierr =
>>>> DMRefine(user->dm,PETSC_COMM_WORLD,&refinedMesh);CHKERRQ(ierr);
>>>>     if (refinedMesh) {
>>>>       ierr     = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>       user->dm = refinedMesh;
>>>>     }
>>>>     ierr   =
>>>> DMPlexDistribute(user->dm,"chaco",1,&distributedMesh);CHKERRQ(ierr);
>>>>     if (distributedMesh) {
>>>>       ierr = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>       user->dm  = distributedMesh;
>>>>     }
>>>>   }
>>>>
>>>> Using gdb, the code gives a SEGV during distribution. The backtrace
>>>> when the fault
>>>> occurs points to an invalid pointer for ISGetIndices(). Attached is a
>>>> screenshot of the
>>>> gdb backtrace.
>>>> Do I need to set up some index set here?
>>>>
>>>> The same error occurs when trying to distribute a mesh using
>>>> DMPlexCreateFromCellList().
>>>>
>>>> Thanks for the help.
>>>>
>>>>
>>>> -
>>>> Garnet
>>>>
>>>>
>>>> On Wed, Aug 28, 2013 at 6:38 AM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>
>>>>> Garnet Vaz <garnet.vaz at gmail.com> writes:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I just rebuilt PETSc on both my laptop and my desktop.
>>>>> > On both machines the output of >grep GIT configure.log
>>>>> >         Defined "VERSION_GIT" to
>>>>> > ""d8f7425765acda418e23a679c25fd616d9da8153""
>>>>> >         Defined "VERSION_DATE_GIT" to ""2013-08-27 10:05:35 -0500""
>>>>>
>>>>> Thanks for the report.  Matt just merged a bunch of DMPlex-related
>>>>> branches (about 60 commits in total).  Can you 'git pull && make' to
>>>>> let
>>>>> us know if the problem is still there?  (It may not fix the issue, but
>>>>> at least we'll be debugging current code.)
>>>>>
>>>>> When dealing with debug vs. optimized issues, it's useful to configure
>>>>> --with-debugging=0 COPTFLAGS='-O2 -g'.  This allows valgrind to include
>>>>> line numbers, but it (usually!) does not affect whether the error
>>>>> occurs.
>>>>>
>>>>> > My code runs on both machines in the debug build without causing
>>>>> > any problems. When I try to run the optimized build, the code crashes
>>>>> > with a SEGV fault on my laptop but not on the desktop. I have built
>>>>> > PETSc using the same configure options.
>>>>> >
>>>>> > I have attached the outputs of valgrind for both my laptop/desktop
>>>>> for
>>>>> > both the debug/opt builds. How can I figure out what differences are
>>>>> > causing the errors in one case and not the other?
>>>>>
>>>>> It looks like an uninitialized variable.  Debug mode often ends up
>>>>> initializing local variables where as optimized leaves junk in them.
>>>>> Stack allocation alignment/padding is also often different.
>>>>> Unfortunately, valgrind is less powerful for debugging stack
>>>>> corruption,
>>>>> so the uninitialized warning is usually the best you get.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Garnet
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>>
>> --
>> Regards,
>> Garnet
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>



-- 
Regards,
Garnet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130828/5aa4bb3f/attachment.html>


More information about the petsc-users mailing list