[petsc-users] Big discrepancy between machines

Thu Dec 17 10:45:48 CST 2015

> On Dec 17, 2015, at 7:47 AM, Timothée Nicolas <timothee.nicolas at gmail.com> wrote:
> 
> It works very smoothly for the SNES :-), but KSPGetSolution keeps returning a zero vector... KSPGetResidualNorm gives me the norm alright, but I would like to actually see the vectors. Is KSPGetSolution the wrong routine ?

  If you are using GMRES, which is the default, it actually DOES NOT have a representation of the solution at each step. Yes that seems odd but it only computes the solution vector at a restart or when the iteration ends. This is why KSPGetSolution doesn't provide anything useful.  You can use KSPBuildSolution() to have it construct the "current" solution whenever you need it.

  Barry

> 
> Thx
> 
> Timothée
> 
> 2015-12-17 19:19 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
> 
> 
> On 17 December 2015 at 11:00, Timothée Nicolas <timothee.nicolas at gmail.com> wrote:
> Hi,
> 
> So, valgrind is OK (at least on the local machine. Actually on the cluster helios, it produces strange results even for the simplest petsc program PetscInitialize followed by PetscFinalize, I will try to figure this out with their technical team), and I have also tried with exactly the same versions (3.6.0) and it does not change the behavior.
> 
> So now I would like to now how to have a grip on what comes in and out of the SNES and the KSP internal to the SNES. That is, I would like to inspect manually the vector which enters the SNES in the first place (should be zero I believe), what is being fed to the KSP, and the vector which comes out of it, etc. if possible at each iteration of the SNES. I want to actually see these vectors, and compute there norm by hand. The trouble is, it is really hard to understand why the newton residuals are not reduced since the KSP converges so nicely. This does not make any sense to me, so I want to know what happens to the vectors. But on the SNES list of routines, I did not find the tools that would allow me to do that (and messing around with the C code is too hard for me, it would take me weeks). Does someone have a hint ?
> 
> The only sane way to do this is to write a custom monitor for your SNES object.
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESMonitorSet.html
> 
> Inside your monitor, you have access the SNES, and everything it defines, e.g. the current solution, non-linear residual, KSP etc. See these pages
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetKSP.html
> 
> Then you can pull apart the residual and compute specific norms (or plot the residual).
> 
> Hopefully you can access everything you need to perform your analysis.
> 
> Cheers,
>   Dave
>  
> 
> Thx
> 
> Timothee
> 
> 
>  
> 
> 2015-12-15 14:20 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
> On Mon, Dec 14, 2015 at 11:06 PM, Timothée Nicolas <timothee.nicolas at gmail.com> wrote:
> There is a diference in valgrind indeed between the two. It seems to be clean on my desktop Mac OS X but not on the cluster. I'll try to see what's causing this. I still don't understand well what's causing memory leaks in the case where all PETSc objects are freed correctly (as can pbe checked with -log_summary).
> 
> Also, I have tried running either 
> 
> valgrind ./my_code -option1 -option2...
> 
> or 
> 
> valgrind mpiexec -n 1 ./my_code -option1 -option2...
> 
> Note here you would need --trace-children=yes for valgrind.
> 
>   Matt
>  
> It seems the second is the correct way to proceed right ? This gives very different behaviour for valgrind.
> 
> Timothee
> 
> 
> 
> 2015-12-14 17:38 GMT+09:00 Timothée Nicolas <timothee.nicolas at gmail.com>:
> OK, I'll try that, thx
> 
> 2015-12-14 17:38 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
> You have the configure line, so it should be relatively straight forward to configure / build petsc in your home directory. 
> 
> 
> On 14 December 2015 at 09:34, Timothée Nicolas <timothee.nicolas at gmail.com> wrote:
> OK, The problem is that I don't think I can change this easily as far as the cluster is concerned. I obtain access to petsc by loading the petsc module, and even if I have a few choices, I don't see any debug builds...
> 
> 2015-12-14 17:26 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
> 
> 
> On Monday, 14 December 2015, Timothée Nicolas <timothee.nicolas at gmail.com> wrote:
> Hum, OK. I use FORTRAN by the way. Is your comment still valid ? 
> 
> No. Fortran compilers init variables to zero.
> In this case, I would run a debug build on your OSX machine through valgrind and make sure it is clean. 
> 
> Other obvious thing to check what happens if use exactly the same petsc builds on both machines. I see 3.6.1 and 3.6.0 are being used. 
> 
> For all this type of checking, I would definitely use debug builds on both machines. Your cluster build is using the highest level of optimization...
> 
> 
>  
> I'll check anyway, but I thought I had been careful about this sort of things. 
> 
> Also, I thought the problem on Mac OS X may have been due to the fact I used the version with debugging on, so I rerun configure with --with-debugging=no, which did not change anything.
> 
> Thx
> 
> Timothee
> 
> 
> 2015-12-14 17:04 GMT+09:00 Dave May <dave.mayhem23 at gmail.com>:
> One suggestion is you have some uninitialized variables in your pcshell. Despite your arch being called "debug", your configure options indicate you have turned debugging off.
> 
> C standard doesn't prescribe how uninit variables should be treated - the behavior is labelled as undefined. As a result, different compilers on different archs with the same optimization flags can and will treat uninit variables differently. I find OSX c compilers tend to set them to zero.
> 
> I suggest compiling a debug build on both machines and trying your test again. Also, consider running the debug builds through valgrind. 
> 
> Thanks,
>   Dave 
> 
> On Monday, 14 December 2015, Timothée Nicolas <timothee.nicolas at gmail.com> wrote:
> Hi,
> 
> I have noticed I have a VERY big difference in behaviour between two machines in my problem, solved with SNES. I can't explain it, because I have tested my operators which give the same result. I also checked that the vectors fed to the SNES are the same. The problem happens only with my shell preconditioner. When I don't use it, and simply solve using -snes_mf, I don't see anymore than the usual 3-4 changing digits at the end of the residuals. However, when I use my pcshell, the results are completely different between the two machines.
> 
> I have attached output_SuperComputer.txt and output_DesktopComputer.txt, which correspond to the output from the exact same code and options (and of course same input data file !). More precisely
> 
> output_SuperComputer.txt : output on a supercomputer called Helios, sorry I don't know the exact specs.
> In this case, the SNES norms are reduced successively:
> 0 SNES Function norm 4.867111712420e-03
> 1 SNES Function norm 5.632325929998e-08
> 2 SNES Function norm 7.427800084502e-15
> 
> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4 GHz Intel Core i5 16GB 1600 MHz DDr3. (the same happens on an other laptop with Mac OS X Mavericks). 
> In this case, I obtain the following for the SNES norms,
> while in the other, I obtain 
> 0 SNES Function norm 4.867111713544e-03
> 1 SNES Function norm 1.560094052222e-03
> 2 SNES Function norm 1.552118650943e-03
> 3 SNES Function norm 1.552106297094e-03
> 4 SNES Function norm 1.552106277949e-03
> which I can't explain, because otherwise the KSP residual (with the same operator, which I checked) behave well.
> 
> As you can see, the first time the preconditioner is applied (DB_, DP_, Drho_ and PS_ solves), the two outputs coincide (except for the few last digits, up to 9 actually, which is more than I would expect), and everything starts to diverge at the first print of the main KSP (the one stemming from the SNES) residual norms.
> 
> Do you have an idea what may cause such a strange behaviour ?
> 
> Best
> 
> Timothee
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> 
>