On Fri, Aug 26, 2011 at 8:37 AM, Dominik Szczerba <span dir="ltr"><<a href="mailto:dominik@itis.ethz.ch">dominik@itis.ethz.ch</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">> When you run in the debugger and break after it has obviously hung, are all<br>
> processes stopped at the same place?<br>
<br>
</div>Of course not, they are stuck at barriers elsewhere. Thanks for the<br>
valuable question.<br>
<div class="im"><br>
> If you see an error condition, you can<br>
> run<br>
> CHKMEMQ;<br>
> MPI_Barrier(((PetscObject)A)->comm);<br>
> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);<br>
> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);<br>
> If it hangs, check where every process is stuck.<br>
<br>
</div>I obviously seem to be missing some barriers. But why would I need<br>
MPI_Barrier(((PetscObject)A)->comm) and not just<br>
MPI_Barrier(PETSC_COMM_WORLD)? Would that only force a barrier for<br>
A-related traffic?</blockquote><div><br></div><div>The idea here is the following:</div><div><br></div><div> 1) We would like to isolate the mismatch in synchronizations</div><div><br></div><div> 2) We can place barriers in the code to delimit the sections which contain the offending code,</div>
<div> and also eliminate bugs in MatAssembly as a possible source of problems.</div><div><br></div><div> 3) Do you have any MPI code you wrote yourself in here?</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<font color="#888888"><br>
Dominik<br>
</font></blockquote></div><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>