On Sat, Sep 10, 2011 at 6:38 AM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="gmail_quote"><div class="im">On Sat, Sep 10, 2011 at 13:09, Tabrez Ali <span dir="ltr"><<a href="mailto:stali@geology.wisc.edu" target="_blank">stali@geology.wisc.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello<br>
<br>
I am running an application using PETSc 3.2 RC on a poor mans cluster at my home (for testing only) which has two nodes running different versions of Debian (they also have different versions of gcc/gfortran) but have the same MPICH2 1.4 and PETSc 3.2 installed on them.<br>
<br>
Also they do not share the same file system but I make sure that input file/executable paths are exactly same on both machines. After compiling the code separately on the two nodes I launch the parallel program from node 1 using mpiexec -f hosts -n 2 .... (hydra process manager).<br>
<br>
With PETSc 3.1 the application runs fine, both with CG and GMRES and correct output is generated on both nodes.<br>
<br>
With PETSc 3.2 the application runs fine with CG.<br>
<br>
But whenever I use GMRES in 3.2 I get an error (listed below) during KSPSolve.</blockquote><div><br></div></div><div>Can you reproduce this with any of the examples? For example</div><div><br></div><div>cd src/ksp/ksp/examples/tutorials</div>
<div>make ex2</div><div>mpiexec -f hosts -n 2 ./ex2 -ksp_type gmres</div><div><br></div><div>or, to use your matrix, run (any version that works, including 3.1) with -ksp_view_binary and then</div><div><br></div><div>cd src/ksp/ksp/examples/tutorials</div>
<div>make ex10</div><div>mpiexec -f hosts -n 2 ./ex10 -f binaryoutput -ksp_type gmres</div><div><br></div><div>If these work, there might be memory corruption somewhere in your code causing this.</div><div><br></div><div>
<br></div><div>You can also run with -start_in_debugger and check what is in the "alpha" array on each process.</div></div></blockquote><div><br></div><div>This can happen if a NaN is produced as well. You can easily check by launching the debugger and</div>
<div>printing the value.</div><div><br></div><div> Matt </div></div><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>