[petsc-users] Big discrepancy between machines

Dave May dave.mayhem23 at gmail.com
Mon Dec 14 02:26:01 CST 2015


On Monday, 14 December 2015, Timothée Nicolas <timothee.nicolas at gmail.com
<javascript:_e(%7B%7D,'cvml','timothee.nicolas at gmail.com');>> wrote:

> Hum, OK. I use FORTRAN by the way. Is your comment still valid ?
>

No. Fortran compilers init variables to zero.
In this case, I would run a debug build on your OSX machine through
valgrind and make sure it is clean.

Other obvious thing to check what happens if use exactly the same petsc
builds on both machines. I see 3.6.1 and 3.6.0 are being used.

For all this type of checking, I would definitely use debug builds on both
machines. Your cluster build is using the highest level of optimization...



> I'll check anyway, but I thought I had been careful about this sort of
> things.
>
> Also, I thought the problem on Mac OS X may have been due to the fact I
> used the version with debugging on, so I rerun configure with
> --with-debugging=no, which did not change anything.
>
> Thx
>
> Timothee
>
>
> 2015-12-14 17:04 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>
>> One suggestion is you have some uninitialized variables in your pcshell.
>> Despite your arch being called "debug", your configure options indicate you
>> have turned debugging off.
>>
>> C standard doesn't prescribe how uninit variables should be treated - the
>> behavior is labelled as undefined. As a result, different compilers on
>> different archs with the same optimization flags can and will treat uninit
>> variables differently. I find OSX c compilers tend to set them to zero.
>>
>> I suggest compiling a debug build on both machines and trying your
>> test again. Also, consider running the debug builds through valgrind.
>>
>> Thanks,
>>   Dave
>>
>> On Monday, 14 December 2015, Timothée Nicolas <timothee.nicolas at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have noticed I have a VERY big difference in behaviour between two
>>> machines in my problem, solved with SNES. I can't explain it, because I
>>> have tested my operators which give the same result. I also checked that
>>> the vectors fed to the SNES are the same. The problem happens only with my
>>> shell preconditioner. When I don't use it, and simply solve using -snes_mf,
>>> I don't see anymore than the usual 3-4 changing digits at the end of the
>>> residuals. However, when I use my pcshell, the results are completely
>>> different between the two machines.
>>>
>>> I have attached output_SuperComputer.txt and output_DesktopComputer.txt,
>>> which correspond to the output from the exact same code and options (and of
>>> course same input data file !). More precisely
>>>
>>> output_SuperComputer.txt : output on a supercomputer called Helios,
>>> sorry I don't know the exact specs.
>>> In this case, the SNES norms are reduced successively:
>>> 0 SNES Function norm 4.867111712420e-03
>>> 1 SNES Function norm 5.632325929998e-08
>>> 2 SNES Function norm 7.427800084502e-15
>>>
>>> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4 GHz Intel
>>> Core i5 16GB 1600 MHz DDr3. (the same happens on an other laptop with Mac
>>> OS X Mavericks).
>>> In this case, I obtain the following for the SNES norms,
>>> while in the other, I obtain
>>> 0 SNES Function norm 4.867111713544e-03
>>> 1 SNES Function norm 1.560094052222e-03
>>> 2 SNES Function norm 1.552118650943e-03
>>> 3 SNES Function norm 1.552106297094e-03
>>> 4 SNES Function norm 1.552106277949e-03
>>> which I can't explain, because otherwise the KSP residual (with the same
>>> operator, which I checked) behave well.
>>>
>>> As you can see, the first time the preconditioner is applied (DB_, DP_,
>>> Drho_ and PS_ solves), the two outputs coincide (except for the few last
>>> digits, up to 9 actually, which is more than I would expect), and
>>> everything starts to diverge at the first print of the main KSP (the one
>>> stemming from the SNES) residual norms.
>>>
>>> Do you have an idea what may cause such a strange behaviour ?
>>>
>>> Best
>>>
>>> Timothee
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151214/6c0c6076/attachment.html>


More information about the petsc-users mailing list