[petsc-users] Big discrepancy between machines
Matthew Knepley
knepley at gmail.com
Mon Dec 14 23:20:08 CST 2015
On Mon, Dec 14, 2015 at 11:06 PM, Timothée Nicolas <
timothee.nicolas at gmail.com> wrote:
> There is a diference in valgrind indeed between the two. It seems to be
> clean on my desktop Mac OS X but not on the cluster. I'll try to see what's
> causing this. I still don't understand well what's causing memory leaks in
> the case where all PETSc objects are freed correctly (as can pbe checked
> with -log_summary).
>
> Also, I have tried running either
>
> valgrind ./my_code -option1 -option2...
>
> or
>
> valgrind mpiexec -n 1 ./my_code -option1 -option2...
>
Note here you would need --trace-children=yes for valgrind.
Matt
> It seems the second is the correct way to proceed right ? This gives very
> different behaviour for valgrind.
>
> Timothee
>
>
>
> 2015-12-14 17:38 GMT+09:00 Timothée Nicolas <timothee.nicolas at gmail.com>:
>
>> OK, I'll try that, thx
>>
>> 2015-12-14 17:38 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>
>>> You have the configure line, so it should be relatively straight forward
>>> to configure / build petsc in your home directory.
>>>
>>>
>>> On 14 December 2015 at 09:34, Timothée Nicolas <
>>> timothee.nicolas at gmail.com> wrote:
>>>
>>>> OK, The problem is that I don't think I can change this easily as far
>>>> as the cluster is concerned. I obtain access to petsc by loading the petsc
>>>> module, and even if I have a few choices, I don't see any debug builds...
>>>>
>>>> 2015-12-14 17:26 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>>>
>>>>>
>>>>>
>>>>> On Monday, 14 December 2015, Timothée Nicolas <
>>>>> timothee.nicolas at gmail.com> wrote:
>>>>>
>>>>>> Hum, OK. I use FORTRAN by the way. Is your comment still valid ?
>>>>>>
>>>>>
>>>>> No. Fortran compilers init variables to zero.
>>>>> In this case, I would run a debug build on your OSX machine through
>>>>> valgrind and make sure it is clean.
>>>>>
>>>>> Other obvious thing to check what happens if use exactly the same
>>>>> petsc builds on both machines. I see 3.6.1 and 3.6.0 are being used.
>>>>>
>>>>> For all this type of checking, I would definitely use debug builds on
>>>>> both machines. Your cluster build is using the highest level of
>>>>> optimization...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I'll check anyway, but I thought I had been careful about this sort
>>>>>> of things.
>>>>>>
>>>>>> Also, I thought the problem on Mac OS X may have been due to the fact
>>>>>> I used the version with debugging on, so I rerun configure with
>>>>>> --with-debugging=no, which did not change anything.
>>>>>>
>>>>>> Thx
>>>>>>
>>>>>> Timothee
>>>>>>
>>>>>>
>>>>>> 2015-12-14 17:04 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>>>>>
>>>>>>> One suggestion is you have some uninitialized variables in your
>>>>>>> pcshell. Despite your arch being called "debug", your configure options
>>>>>>> indicate you have turned debugging off.
>>>>>>>
>>>>>>> C standard doesn't prescribe how uninit variables should be treated
>>>>>>> - the behavior is labelled as undefined. As a result, different compilers
>>>>>>> on different archs with the same optimization flags can and will treat
>>>>>>> uninit variables differently. I find OSX c compilers tend to set them to
>>>>>>> zero.
>>>>>>>
>>>>>>> I suggest compiling a debug build on both machines and trying your
>>>>>>> test again. Also, consider running the debug builds through valgrind.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dave
>>>>>>>
>>>>>>> On Monday, 14 December 2015, Timothée Nicolas <
>>>>>>> timothee.nicolas at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have noticed I have a VERY big difference in behaviour between
>>>>>>>> two machines in my problem, solved with SNES. I can't explain it, because I
>>>>>>>> have tested my operators which give the same result. I also checked that
>>>>>>>> the vectors fed to the SNES are the same. The problem happens only with my
>>>>>>>> shell preconditioner. When I don't use it, and simply solve using -snes_mf,
>>>>>>>> I don't see anymore than the usual 3-4 changing digits at the end of the
>>>>>>>> residuals. However, when I use my pcshell, the results are completely
>>>>>>>> different between the two machines.
>>>>>>>>
>>>>>>>> I have attached output_SuperComputer.txt and
>>>>>>>> output_DesktopComputer.txt, which correspond to the output from the exact
>>>>>>>> same code and options (and of course same input data file !). More precisely
>>>>>>>>
>>>>>>>> output_SuperComputer.txt : output on a supercomputer called Helios,
>>>>>>>> sorry I don't know the exact specs.
>>>>>>>> In this case, the SNES norms are reduced successively:
>>>>>>>> 0 SNES Function norm 4.867111712420e-03
>>>>>>>> 1 SNES Function norm 5.632325929998e-08
>>>>>>>> 2 SNES Function norm 7.427800084502e-15
>>>>>>>>
>>>>>>>> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4 GHz
>>>>>>>> Intel Core i5 16GB 1600 MHz DDr3. (the same happens on an other laptop with
>>>>>>>> Mac OS X Mavericks).
>>>>>>>> In this case, I obtain the following for the SNES norms,
>>>>>>>> while in the other, I obtain
>>>>>>>> 0 SNES Function norm 4.867111713544e-03
>>>>>>>> 1 SNES Function norm 1.560094052222e-03
>>>>>>>> 2 SNES Function norm 1.552118650943e-03
>>>>>>>> 3 SNES Function norm 1.552106297094e-03
>>>>>>>> 4 SNES Function norm 1.552106277949e-03
>>>>>>>> which I can't explain, because otherwise the KSP residual (with the
>>>>>>>> same operator, which I checked) behave well.
>>>>>>>>
>>>>>>>> As you can see, the first time the preconditioner is applied (DB_,
>>>>>>>> DP_, Drho_ and PS_ solves), the two outputs coincide (except for the few
>>>>>>>> last digits, up to 9 actually, which is more than I would expect), and
>>>>>>>> everything starts to diverge at the first print of the main KSP (the one
>>>>>>>> stemming from the SNES) residual norms.
>>>>>>>>
>>>>>>>> Do you have an idea what may cause such a strange behaviour ?
>>>>>>>>
>>>>>>>> Best
>>>>>>>>
>>>>>>>> Timothee
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151214/19ec73ff/attachment-0001.html>
More information about the petsc-users
mailing list