[petsc-users] Big discrepancy between machines

Dave May dave.mayhem23 at gmail.com
Thu Dec 17 04:19:29 CST 2015


On 17 December 2015 at 11:00, Timothée Nicolas <timothee.nicolas at gmail.com>
wrote:

> Hi,
>
> So, valgrind is OK (at least on the local machine. Actually on the cluster
> helios, it produces strange results even for the simplest petsc program
> PetscInitialize followed by PetscFinalize, I will try to figure this out
> with their technical team), and I have also tried with exactly the same
> versions (3.6.0) and it does not change the behavior.
>
> So now I would like to now how to have a grip on what comes in and out of
> the SNES and the KSP internal to the SNES. That is, I would like to inspect
> manually the vector which enters the SNES in the first place (should be
> zero I believe), what is being fed to the KSP, and the vector which comes
> out of it, etc. if possible at each iteration of the SNES. I want to
> actually *see* these vectors, and compute there norm by hand. The trouble
> is, it is really hard to understand why the newton residuals are not
> reduced since the KSP converges so nicely. This does not make any sense to
> me, so I want to know what happens to the vectors. But on the SNES list of
> routines, I did not find the tools that would allow me to do that (and
> messing around with the C code is too hard for me, it would take me weeks).
> Does someone have a hint ?
>

The only sane way to do this is to write a custom monitor for your SNES
object.

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESMonitorSet.html

Inside your monitor, you have access the SNES, and everything it defines,
e.g. the current solution, non-linear residual, KSP etc. See these pages

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetKSP.html

Then you can pull apart the residual and compute specific norms (or plot
the residual).

Hopefully you can access everything you need to perform your analysis.

Cheers,
  Dave


>
> Thx
>
> Timothee
>
>
>
>
> 2015-12-15 14:20 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
>
>> On Mon, Dec 14, 2015 at 11:06 PM, Timothée Nicolas <
>> timothee.nicolas at gmail.com> wrote:
>>
>>> There is a diference in valgrind indeed between the two. It seems to be
>>> clean on my desktop Mac OS X but not on the cluster. I'll try to see what's
>>> causing this. I still don't understand well what's causing memory leaks in
>>> the case where all PETSc objects are freed correctly (as can pbe checked
>>> with -log_summary).
>>>
>>> Also, I have tried running either
>>>
>>> valgrind ./my_code -option1 -option2...
>>>
>>> or
>>>
>>> valgrind mpiexec -n 1 ./my_code -option1 -option2...
>>>
>>
>> Note here you would need --trace-children=yes for valgrind.
>>
>>   Matt
>>
>>
>>> It seems the second is the correct way to proceed right ? This gives
>>> very different behaviour for valgrind.
>>>
>>> Timothee
>>>
>>>
>>>
>>> 2015-12-14 17:38 GMT+09:00 Timothée Nicolas <timothee.nicolas at gmail.com>
>>> :
>>>
>>>> OK, I'll try that, thx
>>>>
>>>> 2015-12-14 17:38 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>>>
>>>>> You have the configure line, so it should be relatively straight
>>>>> forward to configure / build petsc in your home directory.
>>>>>
>>>>>
>>>>> On 14 December 2015 at 09:34, Timothée Nicolas <
>>>>> timothee.nicolas at gmail.com> wrote:
>>>>>
>>>>>> OK, The problem is that I don't think I can change this easily as far
>>>>>> as the cluster is concerned. I obtain access to petsc by loading the petsc
>>>>>> module, and even if I have a few choices, I don't see any debug builds...
>>>>>>
>>>>>> 2015-12-14 17:26 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Monday, 14 December 2015, Timothée Nicolas <
>>>>>>> timothee.nicolas at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hum, OK. I use FORTRAN by the way. Is your comment still valid ?
>>>>>>>>
>>>>>>>
>>>>>>> No. Fortran compilers init variables to zero.
>>>>>>> In this case, I would run a debug build on your OSX machine through
>>>>>>> valgrind and make sure it is clean.
>>>>>>>
>>>>>>> Other obvious thing to check what happens if use exactly the same
>>>>>>> petsc builds on both machines. I see 3.6.1 and 3.6.0 are being used.
>>>>>>>
>>>>>>> For all this type of checking, I would definitely use debug builds
>>>>>>> on both machines. Your cluster build is using the highest level of
>>>>>>> optimization...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I'll check anyway, but I thought I had been careful about this sort
>>>>>>>> of things.
>>>>>>>>
>>>>>>>> Also, I thought the problem on Mac OS X may have been due to the
>>>>>>>> fact I used the version with debugging on, so I rerun configure with
>>>>>>>> --with-debugging=no, which did not change anything.
>>>>>>>>
>>>>>>>> Thx
>>>>>>>>
>>>>>>>> Timothee
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-12-14 17:04 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>>>>>>>
>>>>>>>>> One suggestion is you have some uninitialized variables in your
>>>>>>>>> pcshell. Despite your arch being called "debug", your configure options
>>>>>>>>> indicate you have turned debugging off.
>>>>>>>>>
>>>>>>>>> C standard doesn't prescribe how uninit variables should be
>>>>>>>>> treated - the behavior is labelled as undefined. As a result, different
>>>>>>>>> compilers on different archs with the same optimization flags can and will
>>>>>>>>> treat uninit variables differently. I find OSX c compilers tend to set them
>>>>>>>>> to zero.
>>>>>>>>>
>>>>>>>>> I suggest compiling a debug build on both machines and trying your
>>>>>>>>> test again. Also, consider running the debug builds through valgrind.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>   Dave
>>>>>>>>>
>>>>>>>>> On Monday, 14 December 2015, Timothée Nicolas <
>>>>>>>>> timothee.nicolas at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have noticed I have a VERY big difference in behaviour between
>>>>>>>>>> two machines in my problem, solved with SNES. I can't explain it, because I
>>>>>>>>>> have tested my operators which give the same result. I also checked that
>>>>>>>>>> the vectors fed to the SNES are the same. The problem happens only with my
>>>>>>>>>> shell preconditioner. When I don't use it, and simply solve using -snes_mf,
>>>>>>>>>> I don't see anymore than the usual 3-4 changing digits at the end of the
>>>>>>>>>> residuals. However, when I use my pcshell, the results are completely
>>>>>>>>>> different between the two machines.
>>>>>>>>>>
>>>>>>>>>> I have attached output_SuperComputer.txt and
>>>>>>>>>> output_DesktopComputer.txt, which correspond to the output from the exact
>>>>>>>>>> same code and options (and of course same input data file !). More precisely
>>>>>>>>>>
>>>>>>>>>> output_SuperComputer.txt : output on a supercomputer called
>>>>>>>>>> Helios, sorry I don't know the exact specs.
>>>>>>>>>> In this case, the SNES norms are reduced successively:
>>>>>>>>>> 0 SNES Function norm 4.867111712420e-03
>>>>>>>>>> 1 SNES Function norm 5.632325929998e-08
>>>>>>>>>> 2 SNES Function norm 7.427800084502e-15
>>>>>>>>>>
>>>>>>>>>> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4
>>>>>>>>>> GHz Intel Core i5 16GB 1600 MHz DDr3. (the same happens on an other laptop
>>>>>>>>>> with Mac OS X Mavericks).
>>>>>>>>>> In this case, I obtain the following for the SNES norms,
>>>>>>>>>> while in the other, I obtain
>>>>>>>>>> 0 SNES Function norm 4.867111713544e-03
>>>>>>>>>> 1 SNES Function norm 1.560094052222e-03
>>>>>>>>>> 2 SNES Function norm 1.552118650943e-03
>>>>>>>>>> 3 SNES Function norm 1.552106297094e-03
>>>>>>>>>> 4 SNES Function norm 1.552106277949e-03
>>>>>>>>>> which I can't explain, because otherwise the KSP residual (with
>>>>>>>>>> the same operator, which I checked) behave well.
>>>>>>>>>>
>>>>>>>>>> As you can see, the first time the preconditioner is applied
>>>>>>>>>> (DB_, DP_, Drho_ and PS_ solves), the two outputs coincide (except for the
>>>>>>>>>> few last digits, up to 9 actually, which is more than I would expect), and
>>>>>>>>>> everything starts to diverge at the first print of the main KSP (the one
>>>>>>>>>> stemming from the SNES) residual norms.
>>>>>>>>>>
>>>>>>>>>> Do you have an idea what may cause such a strange behaviour ?
>>>>>>>>>>
>>>>>>>>>> Best
>>>>>>>>>>
>>>>>>>>>> Timothee
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151217/1773dd30/attachment.html>


More information about the petsc-users mailing list