[petsc-users] Big discrepancy between machines

Dave May dave.mayhem23 at gmail.com
Mon Dec 14 02:38:18 CST 2015


You have the configure line, so it should be relatively straight forward to
configure / build petsc in your home directory.


On 14 December 2015 at 09:34, Timothée Nicolas <timothee.nicolas at gmail.com>
wrote:

> OK, The problem is that I don't think I can change this easily as far as
> the cluster is concerned. I obtain access to petsc by loading the petsc
> module, and even if I have a few choices, I don't see any debug builds...
>
> 2015-12-14 17:26 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>
>>
>>
>> On Monday, 14 December 2015, Timothée Nicolas <timothee.nicolas at gmail.com>
>> wrote:
>>
>>> Hum, OK. I use FORTRAN by the way. Is your comment still valid ?
>>>
>>
>> No. Fortran compilers init variables to zero.
>> In this case, I would run a debug build on your OSX machine through
>> valgrind and make sure it is clean.
>>
>> Other obvious thing to check what happens if use exactly the same petsc
>> builds on both machines. I see 3.6.1 and 3.6.0 are being used.
>>
>> For all this type of checking, I would definitely use debug builds on
>> both machines. Your cluster build is using the highest level of
>> optimization...
>>
>>
>>
>>
>>> I'll check anyway, but I thought I had been careful about this sort of
>>> things.
>>>
>>> Also, I thought the problem on Mac OS X may have been due to the fact I
>>> used the version with debugging on, so I rerun configure with
>>> --with-debugging=no, which did not change anything.
>>>
>>> Thx
>>>
>>> Timothee
>>>
>>>
>>> 2015-12-14 17:04 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
>>>
>>>> One suggestion is you have some uninitialized variables in your
>>>> pcshell. Despite your arch being called "debug", your configure options
>>>> indicate you have turned debugging off.
>>>>
>>>> C standard doesn't prescribe how uninit variables should be treated -
>>>> the behavior is labelled as undefined. As a result, different compilers on
>>>> different archs with the same optimization flags can and will treat uninit
>>>> variables differently. I find OSX c compilers tend to set them to zero.
>>>>
>>>> I suggest compiling a debug build on both machines and trying your
>>>> test again. Also, consider running the debug builds through valgrind.
>>>>
>>>> Thanks,
>>>>   Dave
>>>>
>>>> On Monday, 14 December 2015, Timothée Nicolas <
>>>> timothee.nicolas at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have noticed I have a VERY big difference in behaviour between two
>>>>> machines in my problem, solved with SNES. I can't explain it, because I
>>>>> have tested my operators which give the same result. I also checked that
>>>>> the vectors fed to the SNES are the same. The problem happens only with my
>>>>> shell preconditioner. When I don't use it, and simply solve using -snes_mf,
>>>>> I don't see anymore than the usual 3-4 changing digits at the end of the
>>>>> residuals. However, when I use my pcshell, the results are completely
>>>>> different between the two machines.
>>>>>
>>>>> I have attached output_SuperComputer.txt and
>>>>> output_DesktopComputer.txt, which correspond to the output from the exact
>>>>> same code and options (and of course same input data file !). More precisely
>>>>>
>>>>> output_SuperComputer.txt : output on a supercomputer called Helios,
>>>>> sorry I don't know the exact specs.
>>>>> In this case, the SNES norms are reduced successively:
>>>>> 0 SNES Function norm 4.867111712420e-03
>>>>> 1 SNES Function norm 5.632325929998e-08
>>>>> 2 SNES Function norm 7.427800084502e-15
>>>>>
>>>>> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4 GHz
>>>>> Intel Core i5 16GB 1600 MHz DDr3. (the same happens on an other laptop with
>>>>> Mac OS X Mavericks).
>>>>> In this case, I obtain the following for the SNES norms,
>>>>> while in the other, I obtain
>>>>> 0 SNES Function norm 4.867111713544e-03
>>>>> 1 SNES Function norm 1.560094052222e-03
>>>>> 2 SNES Function norm 1.552118650943e-03
>>>>> 3 SNES Function norm 1.552106297094e-03
>>>>> 4 SNES Function norm 1.552106277949e-03
>>>>> which I can't explain, because otherwise the KSP residual (with the
>>>>> same operator, which I checked) behave well.
>>>>>
>>>>> As you can see, the first time the preconditioner is applied (DB_,
>>>>> DP_, Drho_ and PS_ solves), the two outputs coincide (except for the few
>>>>> last digits, up to 9 actually, which is more than I would expect), and
>>>>> everything starts to diverge at the first print of the main KSP (the one
>>>>> stemming from the SNES) residual norms.
>>>>>
>>>>> Do you have an idea what may cause such a strange behaviour ?
>>>>>
>>>>> Best
>>>>>
>>>>> Timothee
>>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151214/bf63a68b/attachment-0001.html>


More information about the petsc-users mailing list