On Tue, Feb 7, 2012 at 8:52 PM, Derek Gaston <span dir="ltr"><<a href="mailto:friedmud@gmail.com">friedmud@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Mon, Feb 6, 2012 at 11:20 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div class="gmail_quote"><div><div><br></div></div><div>Hmm, progress semantics of MPI should ensure completion. Stalling the process with gdb should not change anything (assuming you weren't actually making changes with gdb). Can you run with MPICH2?</div>
</div></div>
</blockquote></div><br><div>Ok - an update on this. I recompiled my whole stack with mvapich2... and it still is hanging in the same place:</div><div><br></div><div><div>#0 0x00002b336a732f40 in PMI_Get_rank () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div>
<div>#1 0x00002b336a6bf453 in MPIDI_CH3I_MRAILI_Cq_poll () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div><div>#2 0x00002b336a675818 in MPIDI_CH3I_read_progress () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div>
<div>#3 0x00002b336a67485b in MPIDI_CH3I_Progress () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div><div>#4 0x00002b336a6bea96 in MPIC_Wait () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div>
<div>#5 0x00002b336a6be9db in MPIC_Sendrecv () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div><div>#6 0x00002b336a6be8aa in MPIC_Sendrecv_ft () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div>
<div>#7 0x00002b336a652db1 in MPIR_Allgather_intra_MV2 () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div><div>#8 0x00002b336a652965 in MPIR_Allgather_MV2 () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div>
<div>#9 0x00002b336a651846 in MPIR_Allgather_impl () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div><div>#10 0x00002b336a6517b1 in PMPI_Allgather () from /apps/local/mvapich2/1.7/intel-12.1.1/opt/lib/libmpich.so.3</div>
<div>#11 0x00000000004a1f23 in PetscLayoutSetUp ()</div><div>#12 0x000000000054e469 in MatMPIAIJSetPreallocation_MPIAIJ ()</div><div>#13 0x000000000055584a in MatCreateMPIAIJ ()</div></div><div><br></div><div>It's been hung there for about 35 minutes.</div>
<div><br></div><div>This particular job has ~100 million DoFs with 512 MPI processes. Any ideas?</div></blockquote><div><br></div><div>Same question: are you sure every process is there. I will bet $10 there is at least one missing.</div>
<div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888"><div>Derek</div>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>