[petsc-users] Hang at PetscLayoutSetUp()

Jed Brown jedbrown at mcs.anl.gov
Mon Feb 6 23:27:05 CST 2012


On Tue, Feb 7, 2012 at 08:06, Derek Gaston <friedmud at gmail.com> wrote:

> Hello all,
>
> I'm running some largish finite element calculations at the moment (50
> Million to 400 Million DoFs on up to 10,000 processors) using a code based
> on PETSc (obviously!) and while most of the simulations are working well,
> every now again I seem to run into a hang in the setup phase of the
> simulation.
>
> I've attached GDB several times and it seems to alway be hanging
> in PetscLayoutSetUp() during matrix creation.  Here is the top of a stack
> trace showing what I mean:
>
> #0  0x00002aac9d86cef2 in opal_progress () from
> /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libopen-pl.so.0
> #1  0x00002aac9d16a0c4 in ompi_request_default_wait_all () from
> /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
> #2  0x00002aac9d1da9ee in ompi_coll_tuned_sendrecv_actual () from
> /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
> #3  0x00002aac9d1e2716 in ompi_coll_tuned_allgather_intra_bruck () from
> /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
> #4  0x00002aac9d1db439 in ompi_coll_tuned_allgather_intra_dec_fixed ()
> from /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
> #5  0x00002aac9d1827e6 in PMPI_Allgather () from
> /apps/local/openmpi/1.4.4/intel-12.1.1/opt/lib/libmpi.so.0
> #6  0x0000000000508184 in PetscLayoutSetUp ()
> #7  0x00000000005b9f39 in MatMPIAIJSetPreallocation_MPIAIJ ()
> #8  0x00000000005c1317 in MatCreateMPIAIJ ()
>

Are _all_ the processes making it here?


>
> As you can see, I'm currently using openMPI (even though I do have access
> to others) along with the intel compiler (this is a mostly C++ code).  This
> problem doesn't exhibit itself on any smaller problems (we run TONS of runs
> all the time in the 10,000-5,000,000 DoF range on 1-3000 procs) and only
> seems to come up on these larger runs.
>
> I'm starting to suspect that it's an openMPI issue.  Has anyone seen
> anything like this before?
>
> Here are some specs for my current environment
>
> PETSc 3.1-p8 (I know, I know....)
> OpenMPI 1.4.4
> intel compilers 12.1.1
> Modified Redhat with 2.6.18 Kernel
> QDR Infiniband
>
> Thanks for any help!
>
> Derek
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120207/0d2f9230/attachment.htm>


More information about the petsc-users mailing list