[petsc-dev] mvapich and petsc-dev

Barry Smith bsmith at mcs.anl.gov
Thu Apr 14 11:11:19 CDT 2011


   Ethan,

    First valgrind the heck out of the code on a system where you can and make sure it is completely clean.

   On the bad system see if the code crashes with no optimization turned on.  

    Does it always crash at the same place? Or seemingly at some random place. If the same place can you do some kind of restart file so that it crashes soon after you start instead of after many time-steps.

     My guess is the problem is a combination of the mvapich and possibly the hardware.  Maybe bug the systems people about upgrades on the system?

     Barry



On Apr 14, 2011, at 11:04 AM, Ethan Coon wrote:

> I'm a bit grasping at straws here, because I'm completely stymied, so
> please bear with me.
> 
> 
> I'm running a program in two locations -- on local workstations with
> mpich2 and on a supercomputer with mvapich. 
> 
> On the workstation, the program runs, in all cases I've tested,
> including 8 processes (the number of cores), and up to 64 processes
> (multiple procs per core). 
> 
> On the supercomputer, it runs on 16 cores (one full node).  With 64
> cores, it seg-faults and core dumps many timesteps in to the
> simulation.    
> 
> Using a debugger, a debug-enabled petsc-dev, but with no access to
> debugging symbols in the mvapich installation, I've looked at the core.
> It appears to dump during VecScatterBegin_1 (within a DMDALocalToLocal()
> with xin = xout).  The Vec I pass in as both input and output appears
> normal.
> 
> The stack looks something like:
> 
>     MPIR_HBT_lookup,   FP=7fff1010f740
>     PMPI_Attr_get,     FP=7fff1010f780
> PetscCommDuplicate, FP=7fff1010f7d0
> PetscViewerASCIIGetStdout, FP=7fff1010f800
> PETSC_VIEWER_STDOUT_, FP=7fff1010f820
> PetscDefaultSignalHandler, FP=7fff1010fa70
> PetscSignalHandler_Private, FP=7fff1010fa90
> **** Signal Stack Frame ******************
>     MPID_IsendContig,  FP=7fff1010ff20
>     MPID_IsendDatatype, FP=7fff1010ffa0
>     PMPI_Start,        FP=7fff1010fff0
> VecScatterBegin_1, FP=7fff10110080
> VecScatterBegin,   FP=7fff101100e0
> DMDALocalToLocalBegin, FP=7fff10110120
> dmdalocaltolocalbegin_, FP=7fff10110160
> 
> 
> Has anyone run into anything like this before?  I have no clue even how
> to proceed, and I doubt this is a PETSc problem, but I figured you guys
> might have enough experience in these types of issues to know where to
> look from here...
> 
> Thanks,
> 
> Ethan
> 
> 
> -- 
> ------------------------------------
> Ethan Coon
> Post-Doctoral Researcher
> Applied Mathematics - T-5
> Los Alamos National Laboratory
> 505-665-8289
> 
> http://www.ldeo.columbia.edu/~ecoon/
> ------------------------------------
> 




More information about the petsc-dev mailing list