[petsc-users] Hang at PetscLayoutSetUp()

Derek Gaston friedmud at gmail.com
Mon Feb 6 23:06:29 CST 2012


Hello all,

I'm running some largish finite element calculations at the moment (50
Million to 400 Million DoFs on up to 10,000 processors) using a code based
on PETSc (obviously!) and while most of the simulations are working well,
every now again I seem to run into a hang in the setup phase of the
simulation.

I've attached GDB several times and it seems to alway be hanging
in PetscLayoutSetUp() during matrix creation.  Here is the top of a stack
trace showing what I mean:

#0  0x00002aac9d86cef2 in opal_progress () from
/apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libopen-pal.so.0
#1  0x00002aac9d16a0c4 in ompi_request_default_wait_all () from
/apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
#2  0x00002aac9d1da9ee in ompi_coll_tuned_sendrecv_actual () from
/apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
#3  0x00002aac9d1e2716 in ompi_coll_tuned_allgather_intra_bruck () from
/apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
#4  0x00002aac9d1db439 in ompi_coll_tuned_allgather_intra_dec_fixed () from
/apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
#5  0x00002aac9d1827e6 in PMPI_Allgather () from
/apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
#6  0x0000000000508184 in PetscLayoutSetUp ()
#7  0x00000000005b9f39 in MatMPIAIJSetPreallocation_MPIAIJ ()
#8  0x00000000005c1317 in MatCreateMPIAIJ ()

As you can see, I'm currently using openMPI (even though I do have access
to others) along with the intel compiler (this is a mostly C++ code).  This
problem doesn't exhibit itself on any smaller problems (we run TONS of runs
all the time in the 10,000-5,000,000 DoF range on 1-3000 procs) and only
seems to come up on these larger runs.

I'm starting to suspect that it's an openMPI issue.  Has anyone seen
anything like this before?

Here are some specs for my current environment

PETSc 3.1-p8 (I know, I know....)
OpenMPI 1.4.4
intel compilers 12.1.1
Modified Redhat with 2.6.18 Kernel
QDR Infiniband

Thanks for any help!

Derek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120206/ea87a05f/attachment.htm>


More information about the petsc-users mailing list