Hello all,<div><br></div><div>I'm running some largish finite element calculations at the moment (50 Million to 400 Million DoFs on up to 10,000 processors) using a code based on PETSc (obviously!) and while most of the simulations are working well, every now again I seem to run into a hang in the setup phase of the simulation.</div>
<div><br></div><div>I've attached GDB several times and it seems to alway be hanging in PetscLayoutSetUp() during matrix creation. Here is the top of a stack trace showing what I mean:</div><div><br></div><div><div>#0 0x00002aac9d86cef2 in opal_progress () from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libopen-pal.so.0</div>
<div>#1 0x00002aac9d16a0c4 in ompi_request_default_wait_all () from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0</div><div>#2 0x00002aac9d1da9ee in ompi_coll_tuned_sendrecv_actual () from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0</div>
<div>#3 0x00002aac9d1e2716 in ompi_coll_tuned_allgather_intra_bruck () from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0</div><div>#4 0x00002aac9d1db439 in ompi_coll_tuned_allgather_intra_dec_fixed () from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0</div>
<div>#5 0x00002aac9d1827e6 in PMPI_Allgather () from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0</div><div>#6 0x0000000000508184 in PetscLayoutSetUp ()</div><div>#7 0x00000000005b9f39 in MatMPIAIJSetPreallocation_MPIAIJ ()</div>
<div>#8 0x00000000005c1317 in MatCreateMPIAIJ ()</div></div><div><br></div><div>As you can see, I'm currently using openMPI (even though I do have access to others) along with the intel compiler (this is a mostly C++ code). This problem doesn't exhibit itself on any smaller problems (we run TONS of runs all the time in the 10,000-5,000,000 DoF range on 1-3000 procs) and only seems to come up on these larger runs.</div>
<div><br></div><div>I'm starting to suspect that it's an openMPI issue. Has anyone seen anything like this before?</div><div><br></div><div>Here are some specs for my current environment</div><div><br></div><div>
PETSc 3.1-p8 (I know, I know....)</div><div>OpenMPI 1.4.4</div><div>intel compilers 12.1.1</div><div>Modified Redhat with 2.6.18 Kernel</div><div>QDR Infiniband</div><div><br></div><div>Thanks for any help!</div><div><br>
</div><div>Derek</div>