[petsc-users] Optimized run crashes on one machine but not another

Matthew Knepley knepley at gmail.com
Wed Aug 28 15:08:44 CDT 2013


On Wed, Aug 28, 2013 at 3:04 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:

> Hi Matt,
>
> I just built the 3.4.2 release in the hope that it will work. It was
> working fine for the 'next'
> branch until a recent update last night. I updated my laptop/desktop with
> a 1/2 hour
> gap which caused crashes in one build but not in the other. Hence, I moved
> to the
> 3.4.2 release.
>
> I will rebuild using the current 'next' and let you know if there are any
> problems.
>

Can you send configure.log? I built against OpenMPI and it looks like a get
a similar error
which is not there with MPICH. Trying to confirm now.

  Matt


> Thanks.
>
> -
> Garnet
>
>
>
> On Wed, Aug 28, 2013 at 12:51 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Wed, Aug 28, 2013 at 1:58 PM, Garnet Vaz <garnet.vaz at gmail.com> wrote:
>>
>>> Hi Matt,
>>>
>>> Attached is a folder containing the code and a sample mesh.
>>>
>>
>> I have built and run it here with the 'next' branch from today, and it
>> does not crash.
>> What branch are you using?
>>
>>     Matt
>>
>>
>>> Thanks for the help.
>>>
>>> -
>>> Garnet
>>>
>>>
>>> On Wed, Aug 28, 2013 at 11:43 AM, Matthew Knepley <knepley at gmail.com>wrote:
>>>
>>>> On Wed, Aug 28, 2013 at 12:52 PM, Garnet Vaz <garnet.vaz at gmail.com>wrote:
>>>>
>>>>> Thanks Jed. I did as you told and the code finally crashes on both
>>>>> builds. I installed the 3.4.2 release now.
>>>>>
>>>>> The problem now seems to come from DMPlexDistribute(). I have two
>>>>> versions to load the mesh. One creates a mesh using Triangle
>>>>> from PETSc and the other loads a mesh using DMPlexCreateFromCellList().
>>>>>
>>>>> Is the following piece of code for creating a mesh using Triangle
>>>>> right?
>>>>>
>>>>
>>>> Okay, something is really very wrong here. It is calling
>>>> EnlargePartition(), but for
>>>> that path to be taken, you have to trip and earlier exception. It
>>>> should not be possible
>>>> to call it. So I think you have memory corruption somewhere.
>>>>
>>>> Can you send a sample code we can run?
>>>>
>>>>   Thanks,
>>>>
>>>>       Matt
>>>>
>>>>
>>>>>   ierr =
>>>>> DMPlexCreateBoxMesh(comm,2,interpolate,&user->dm);CHKERRQ(ierr);
>>>>>   if (user->dm) {
>>>>>     DM        refinedMesh     = NULL;
>>>>>     DM        distributedMesh = NULL;
>>>>>     ierr =
>>>>> DMPlexSetRefinementLimit(user->dm,refinementLimit);CHKERRQ(ierr);
>>>>>     ierr =
>>>>> DMRefine(user->dm,PETSC_COMM_WORLD,&refinedMesh);CHKERRQ(ierr);
>>>>>     if (refinedMesh) {
>>>>>       ierr     = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>>       user->dm = refinedMesh;
>>>>>     }
>>>>>     ierr   =
>>>>> DMPlexDistribute(user->dm,"chaco",1,&distributedMesh);CHKERRQ(ierr);
>>>>>     if (distributedMesh) {
>>>>>       ierr = DMDestroy(&user->dm);CHKERRQ(ierr);
>>>>>       user->dm  = distributedMesh;
>>>>>     }
>>>>>   }
>>>>>
>>>>> Using gdb, the code gives a SEGV during distribution. The backtrace
>>>>> when the fault
>>>>> occurs points to an invalid pointer for ISGetIndices(). Attached is a
>>>>> screenshot of the
>>>>> gdb backtrace.
>>>>> Do I need to set up some index set here?
>>>>>
>>>>> The same error occurs when trying to distribute a mesh using
>>>>> DMPlexCreateFromCellList().
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>>
>>>>> -
>>>>> Garnet
>>>>>
>>>>>
>>>>> On Wed, Aug 28, 2013 at 6:38 AM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>
>>>>>> Garnet Vaz <garnet.vaz at gmail.com> writes:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I just rebuilt PETSc on both my laptop and my desktop.
>>>>>> > On both machines the output of >grep GIT configure.log
>>>>>> >         Defined "VERSION_GIT" to
>>>>>> > ""d8f7425765acda418e23a679c25fd616d9da8153""
>>>>>> >         Defined "VERSION_DATE_GIT" to ""2013-08-27 10:05:35 -0500""
>>>>>>
>>>>>> Thanks for the report.  Matt just merged a bunch of DMPlex-related
>>>>>> branches (about 60 commits in total).  Can you 'git pull && make' to
>>>>>> let
>>>>>> us know if the problem is still there?  (It may not fix the issue, but
>>>>>> at least we'll be debugging current code.)
>>>>>>
>>>>>> When dealing with debug vs. optimized issues, it's useful to configure
>>>>>> --with-debugging=0 COPTFLAGS='-O2 -g'.  This allows valgrind to
>>>>>> include
>>>>>> line numbers, but it (usually!) does not affect whether the error
>>>>>> occurs.
>>>>>>
>>>>>> > My code runs on both machines in the debug build without causing
>>>>>> > any problems. When I try to run the optimized build, the code
>>>>>> crashes
>>>>>> > with a SEGV fault on my laptop but not on the desktop. I have built
>>>>>> > PETSc using the same configure options.
>>>>>> >
>>>>>> > I have attached the outputs of valgrind for both my laptop/desktop
>>>>>> for
>>>>>> > both the debug/opt builds. How can I figure out what differences are
>>>>>> > causing the errors in one case and not the other?
>>>>>>
>>>>>> It looks like an uninitialized variable.  Debug mode often ends up
>>>>>> initializing local variables where as optimized leaves junk in them.
>>>>>> Stack allocation alignment/padding is also often different.
>>>>>> Unfortunately, valgrind is less powerful for debugging stack
>>>>>> corruption,
>>>>>> so the uninitialized warning is usually the best you get.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Garnet
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Garnet
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> Regards,
> Garnet
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130828/a208f795/attachment-0001.html>


More information about the petsc-users mailing list