[petsc-dev] mvapich and petsc-dev
Ethan Coon
ecoon at lanl.gov
Thu Apr 14 11:04:02 CDT 2011
I'm a bit grasping at straws here, because I'm completely stymied, so
please bear with me.
I'm running a program in two locations -- on local workstations with
mpich2 and on a supercomputer with mvapich.
On the workstation, the program runs, in all cases I've tested,
including 8 processes (the number of cores), and up to 64 processes
(multiple procs per core).
On the supercomputer, it runs on 16 cores (one full node). With 64
cores, it seg-faults and core dumps many timesteps in to the
simulation.
Using a debugger, a debug-enabled petsc-dev, but with no access to
debugging symbols in the mvapich installation, I've looked at the core.
It appears to dump during VecScatterBegin_1 (within a DMDALocalToLocal()
with xin = xout). The Vec I pass in as both input and output appears
normal.
The stack looks something like:
MPIR_HBT_lookup, FP=7fff1010f740
PMPI_Attr_get, FP=7fff1010f780
PetscCommDuplicate, FP=7fff1010f7d0
PetscViewerASCIIGetStdout, FP=7fff1010f800
PETSC_VIEWER_STDOUT_, FP=7fff1010f820
PetscDefaultSignalHandler, FP=7fff1010fa70
PetscSignalHandler_Private, FP=7fff1010fa90
**** Signal Stack Frame ******************
MPID_IsendContig, FP=7fff1010ff20
MPID_IsendDatatype, FP=7fff1010ffa0
PMPI_Start, FP=7fff1010fff0
VecScatterBegin_1, FP=7fff10110080
VecScatterBegin, FP=7fff101100e0
DMDALocalToLocalBegin, FP=7fff10110120
dmdalocaltolocalbegin_, FP=7fff10110160
Has anyone run into anything like this before? I have no clue even how
to proceed, and I doubt this is a PETSc problem, but I figured you guys
might have enough experience in these types of issues to know where to
look from here...
Thanks,
Ethan
--
------------------------------------
Ethan Coon
Post-Doctoral Researcher
Applied Mathematics - T-5
Los Alamos National Laboratory
505-665-8289
http://www.ldeo.columbia.edu/~ecoon/
------------------------------------
More information about the petsc-dev
mailing list