[petsc-users] DMPlexDistribute segmentation fault

Josh L ysjosh.lo at gmail.com
Fri Sep 28 20:57:28 CDT 2018


Hi,

I am implementing DMPlex to handle my mesh in parallel, but I am getting
segmentation fault during DMPlexDistribute.
i am testing with 2 processors on a toy mesh that only has 4 quad elements.

the code is as following
      CALL DMPlexCreateFromCellList
      Check mesh topology. Face, skeleton symmetry
     CALL  DMPlexDistribute (dmserial,0,PETSC_NULL_SF,dmmpi,ierr) .
(segmentation fault on rank #1  here)

I trace it back to the external library "Chaco" called by PETSc to
partition the mesh.

The following is the stack

For rank #1

#10 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18
(at 0x000000000040569d)
#9 createmesh (rank=1, nsize=2, dmmpi=...) at
/work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at
0x000000000040523b)
#8 dmplexdistribute_ (dm=0x320, overlap=0x1, sf=0x7fffffff4480,
dmParallel=0x2aaab78a585d, ierr=0x1) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15
(at 0x00002aaaac08334e)
#7 DMPlexDistribute (dm=0x320, overlap=1, sf=0x7fffffff4480,
dmParallel=0x2aaab78a585d) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664
(at 0x00002aaaabd12d08)
#6 PetscPartitionerPartition (part=0x320, dm=0x1,
partSection=0x7fffffff4480, partition=0x2aaab78a585d) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675
(at 0x00002aaaabcfe92c)
#5 PetscPartitionerPartition_Chaco (part=0x320, dm=0x1, nparts=-48000,
numVertices=-1215670179, start=0x1, adjacency=0x22, partSection=0x0,
partition=0x772120) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1279
(at 0x00002aaaabd03dfd)
#4 interface.Z (nvtxs=800, start=0x1, adjacency=0x7fffffff4480,
vwgts=0x2aaab78a585d, ewgts=0x1, x=0x22, y=0x0, z=0x0, outassignname=0x0,
outfilename=0x0, assignment=0x76ff90, architecture=1, ndims_tot=0,
mesh_dims=0x7fffffff7038, goal=0x0, global_method=1, local_method=1,
rqi_flag=0, vmax=200, ndims=1, eigtol=2.1137067449068142e-314,
seed=123636512) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/main/interface.c:206
(at 0x00002aaaad7aaf70)
#3 submain.Z (graph=0x320, nvtxs=1, nedges=-48000, using_vwgts=-1215670179,
using_ewgts=1, igeom=34, coords=0x0, outassignname=0x0, outfilename=0x0,
assignment=0x76ff8e, goal=0x7739a0, architecture=1, ndims_tot=0,
mesh_dims=0x7fffffff7038, global_method=1, local_method=1, rqi_flag=0,
vmax=200, ndims=1, eigtol=2.1137067449068142e-314, seed=123636512) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/submain/submain.c:151
(at 0x00002aaaad7ae52e)
#2 check_input.Z (graph=0x320, nvtxs=1, nedges=-48000, igeom=-1215670179,
coords=0x1, graphname=0x22 <error: Cannot access memory at address 0x22>,
assignment=0x76ff8e, goal=0x7739a0, architecture=1, ndims_tot=0,
mesh_dims=0x7fffffff7038, global_method=1, local_method=1, rqi_flag=0,
vmax=0x7fffffff47a8, ndims=1, eigtol=2.1137067449068142e-314) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/input/check_input.c:56
(at 0x00002aaaad7c6ed1)
#1 check_graph (graph=0x320, nvtxs=1, nedges=-48000) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:90
(at 0x00002aaaad8204d3)
#0 is_an_edge (vertex=0x320, v2=1, weight2=0x7fffffff4480) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:134
(at 0x00002aaaad8206d3)

For rank #0

#18 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18
(at 0x000000000040569d)
#17 createmesh (rank=0, nsize=2, dmmpi=...) at
/work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at
0x000000000040523b)
#16 dmplexdistribute_ (dm=0x65f300, overlap=0x0, sf=0x2aaab4c8698c,
dmParallel=0xffffffffffffffff, ierr=0x0) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15
(at 0x00002aaaac08334e)
#15 DMPlexDistribute (dm=0x65f300, overlap=0, sf=0x2aaab4c8698c,
dmParallel=0xffffffffffffffff) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664
(at 0x00002aaaabd12d08)
#14 PetscPartitionerPartition (part=0x65f300, dm=0x0,
partSection=0x2aaab4c8698c, partition=0xffffffffffffffff) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675
(at 0x00002aaaabcfe92c)
#13 PetscPartitionerPartition_Chaco (part=0x65f300, dm=0x0,
nparts=-1261934196, numVertices=-1, start=0x0, adjacency=0x0,
partSection=0x7726c0, partition=0x7fffffff71a8) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1314
(at 0x00002aaaabd04029)
#12 ISCreateGeneral (comm=6681344, n=0, idx=0x2aaab4c8698c,
mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292), is=0x0)
at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:671
(at 0x00002aaaab0ea94e)
#11 ISGeneralSetIndices (is=0x65f300, n=0, idx=0x2aaab4c8698c,
mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:698
(at 0x00002aaaab0eaa77)
#10 ISGeneralSetIndices_General.Z (is=0x65f300, n=0, idx=0x2aaab4c8698c,
mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:712
(at 0x00002aaaab0ef567)
#9 PetscLayoutSetUp (map=0x65f300) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/utils/pmap.c:137 (at
0x00002aaaab0c6d52)
#8 PetscSplitOwnership (comm=6681344, n=0x0, N=0x2aaab4c8698c) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/sys/utils/psplit.c:80 (at
0x00002aaaaaeb7b00)
#7 PMPI_Allreduce (sendbuf=0x65f300, recvbuf=0x0, count=-1261934196,
datatype=-1, op=0, comm=0) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1395
(at 0x00002aaab40966e6)
#6 MPIR_Allreduce_intra (sendbuf=0x65f300, recvbuf=0x0, count=-1261934196,
datatype=-1, op=0, comm_ptr=0x0, errflag=0x7fffffff4798) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:339
(at 0x00002aaab409307b)
#5 MPIR_Allreduce_shm_generic (sendbuf=<optimized out>, recvbuf=<optimized
out>, count=<optimized out>, datatype=<optimized out>, op=<optimized out>,
comm_ptr=<optimized out>, errflag=<optimized out>, kind=1476395011) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1137
(at 0x00002aaab409307b)
#4 I_MPI_COLL_SHM_KNARY_REDUCE (node_comm_ptr=<optimized out>,
root=<optimized out>, localbuf=<optimized out>, sendbuf=<optimized out>,
recvbuf=<optimized out>, count=<optimized out>, datatype=<optimized out>,
op=<optimized out>, errflag=<optimized out>, knomial_factor=<optimized
out>) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:1090
(at 0x00002aaab409307b)
#3 I_MPI_COLL_SHM_GENERIC_GATHER_REDUCE..1 (node_comm_ptr=0x65f300, root=0,
is_reduce=-1261934196, localbuf=0xffffffffffffffff, sendbuf=0x0,
recvbuf=0x0, count=1, datatype=1275069445, op=1476395011,
errflag=0x7fffffff4798, knomial_factor=4, algo_type=2) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:558
(at 0x00002aaab408f3a9)
#2 I_MPI_memcpy (destination=<optimized out>, source=0x0, size=<optimized
out>) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:749
(at 0x00002aaab408f3a9)
#1 PMPIDI_CH3I_Progress (progress_state=0x65f300, is_blocking=0) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:981
(at 0x00002aaab40e85a6)
#0 sched_yield () from /lib64/libc.so.6 (at 0x00002aaab7898e47)


Any idea why this is happening? the overlap is set to 0, but it is 1 on
rank #1


Is there anyway to know the cell and vertex number distributed on each
processors?
my old code partitions the mesh with Metis, and I always output cell data
that shows the rank number on each cell, so I can visualize how mesh is
partitioned.
It is not necessary, but is there anyway to get it in DMPlex?
DMPlexGetCellNumbering might work, but it fails the linking. In fact, many
function under developer category fails the linking.


Thanks,
Yu-Sheng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180928/c188efae/attachment-0001.html>


More information about the petsc-users mailing list