[petsc-users] DMPlexDistribute segmentation fault
Josh L
ysjosh.lo at gmail.com
Fri Sep 28 20:57:28 CDT 2018
Hi,
I am implementing DMPlex to handle my mesh in parallel, but I am getting
segmentation fault during DMPlexDistribute.
i am testing with 2 processors on a toy mesh that only has 4 quad elements.
the code is as following
CALL DMPlexCreateFromCellList
Check mesh topology. Face, skeleton symmetry
CALL DMPlexDistribute (dmserial,0,PETSC_NULL_SF,dmmpi,ierr) .
(segmentation fault on rank #1 here)
I trace it back to the external library "Chaco" called by PETSc to
partition the mesh.
The following is the stack
For rank #1
#10 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18
(at 0x000000000040569d)
#9 createmesh (rank=1, nsize=2, dmmpi=...) at
/work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at
0x000000000040523b)
#8 dmplexdistribute_ (dm=0x320, overlap=0x1, sf=0x7fffffff4480,
dmParallel=0x2aaab78a585d, ierr=0x1) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15
(at 0x00002aaaac08334e)
#7 DMPlexDistribute (dm=0x320, overlap=1, sf=0x7fffffff4480,
dmParallel=0x2aaab78a585d) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664
(at 0x00002aaaabd12d08)
#6 PetscPartitionerPartition (part=0x320, dm=0x1,
partSection=0x7fffffff4480, partition=0x2aaab78a585d) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675
(at 0x00002aaaabcfe92c)
#5 PetscPartitionerPartition_Chaco (part=0x320, dm=0x1, nparts=-48000,
numVertices=-1215670179, start=0x1, adjacency=0x22, partSection=0x0,
partition=0x772120) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1279
(at 0x00002aaaabd03dfd)
#4 interface.Z (nvtxs=800, start=0x1, adjacency=0x7fffffff4480,
vwgts=0x2aaab78a585d, ewgts=0x1, x=0x22, y=0x0, z=0x0, outassignname=0x0,
outfilename=0x0, assignment=0x76ff90, architecture=1, ndims_tot=0,
mesh_dims=0x7fffffff7038, goal=0x0, global_method=1, local_method=1,
rqi_flag=0, vmax=200, ndims=1, eigtol=2.1137067449068142e-314,
seed=123636512) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/main/interface.c:206
(at 0x00002aaaad7aaf70)
#3 submain.Z (graph=0x320, nvtxs=1, nedges=-48000, using_vwgts=-1215670179,
using_ewgts=1, igeom=34, coords=0x0, outassignname=0x0, outfilename=0x0,
assignment=0x76ff8e, goal=0x7739a0, architecture=1, ndims_tot=0,
mesh_dims=0x7fffffff7038, global_method=1, local_method=1, rqi_flag=0,
vmax=200, ndims=1, eigtol=2.1137067449068142e-314, seed=123636512) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/submain/submain.c:151
(at 0x00002aaaad7ae52e)
#2 check_input.Z (graph=0x320, nvtxs=1, nedges=-48000, igeom=-1215670179,
coords=0x1, graphname=0x22 <error: Cannot access memory at address 0x22>,
assignment=0x76ff8e, goal=0x7739a0, architecture=1, ndims_tot=0,
mesh_dims=0x7fffffff7038, global_method=1, local_method=1, rqi_flag=0,
vmax=0x7fffffff47a8, ndims=1, eigtol=2.1137067449068142e-314) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/input/check_input.c:56
(at 0x00002aaaad7c6ed1)
#1 check_graph (graph=0x320, nvtxs=1, nedges=-48000) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:90
(at 0x00002aaaad8204d3)
#0 is_an_edge (vertex=0x320, v2=1, weight2=0x7fffffff4480) at
/tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:134
(at 0x00002aaaad8206d3)
For rank #0
#18 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18
(at 0x000000000040569d)
#17 createmesh (rank=0, nsize=2, dmmpi=...) at
/work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at
0x000000000040523b)
#16 dmplexdistribute_ (dm=0x65f300, overlap=0x0, sf=0x2aaab4c8698c,
dmParallel=0xffffffffffffffff, ierr=0x0) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15
(at 0x00002aaaac08334e)
#15 DMPlexDistribute (dm=0x65f300, overlap=0, sf=0x2aaab4c8698c,
dmParallel=0xffffffffffffffff) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664
(at 0x00002aaaabd12d08)
#14 PetscPartitionerPartition (part=0x65f300, dm=0x0,
partSection=0x2aaab4c8698c, partition=0xffffffffffffffff) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675
(at 0x00002aaaabcfe92c)
#13 PetscPartitionerPartition_Chaco (part=0x65f300, dm=0x0,
nparts=-1261934196, numVertices=-1, start=0x0, adjacency=0x0,
partSection=0x7726c0, partition=0x7fffffff71a8) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1314
(at 0x00002aaaabd04029)
#12 ISCreateGeneral (comm=6681344, n=0, idx=0x2aaab4c8698c,
mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292), is=0x0)
at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:671
(at 0x00002aaaab0ea94e)
#11 ISGeneralSetIndices (is=0x65f300, n=0, idx=0x2aaab4c8698c,
mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:698
(at 0x00002aaaab0eaa77)
#10 ISGeneralSetIndices_General.Z (is=0x65f300, n=0, idx=0x2aaab4c8698c,
mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:712
(at 0x00002aaaab0ef567)
#9 PetscLayoutSetUp (map=0x65f300) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/utils/pmap.c:137 (at
0x00002aaaab0c6d52)
#8 PetscSplitOwnership (comm=6681344, n=0x0, N=0x2aaab4c8698c) at
/home1/apps/intel17/impi17_0/petsc/3.9/src/sys/utils/psplit.c:80 (at
0x00002aaaaaeb7b00)
#7 PMPI_Allreduce (sendbuf=0x65f300, recvbuf=0x0, count=-1261934196,
datatype=-1, op=0, comm=0) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1395
(at 0x00002aaab40966e6)
#6 MPIR_Allreduce_intra (sendbuf=0x65f300, recvbuf=0x0, count=-1261934196,
datatype=-1, op=0, comm_ptr=0x0, errflag=0x7fffffff4798) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:339
(at 0x00002aaab409307b)
#5 MPIR_Allreduce_shm_generic (sendbuf=<optimized out>, recvbuf=<optimized
out>, count=<optimized out>, datatype=<optimized out>, op=<optimized out>,
comm_ptr=<optimized out>, errflag=<optimized out>, kind=1476395011) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1137
(at 0x00002aaab409307b)
#4 I_MPI_COLL_SHM_KNARY_REDUCE (node_comm_ptr=<optimized out>,
root=<optimized out>, localbuf=<optimized out>, sendbuf=<optimized out>,
recvbuf=<optimized out>, count=<optimized out>, datatype=<optimized out>,
op=<optimized out>, errflag=<optimized out>, knomial_factor=<optimized
out>) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:1090
(at 0x00002aaab409307b)
#3 I_MPI_COLL_SHM_GENERIC_GATHER_REDUCE..1 (node_comm_ptr=0x65f300, root=0,
is_reduce=-1261934196, localbuf=0xffffffffffffffff, sendbuf=0x0,
recvbuf=0x0, count=1, datatype=1275069445, op=1476395011,
errflag=0x7fffffff4798, knomial_factor=4, algo_type=2) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:558
(at 0x00002aaab408f3a9)
#2 I_MPI_memcpy (destination=<optimized out>, source=0x0, size=<optimized
out>) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:749
(at 0x00002aaab408f3a9)
#1 PMPIDI_CH3I_Progress (progress_state=0x65f300, is_blocking=0) at
/tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:981
(at 0x00002aaab40e85a6)
#0 sched_yield () from /lib64/libc.so.6 (at 0x00002aaab7898e47)
Any idea why this is happening? the overlap is set to 0, but it is 1 on
rank #1
Is there anyway to know the cell and vertex number distributed on each
processors?
my old code partitions the mesh with Metis, and I always output cell data
that shows the rank number on each cell, so I can visualize how mesh is
partitioned.
It is not necessary, but is there anyway to get it in DMPlex?
DMPlexGetCellNumbering might work, but it fails the linking. In fact, many
function under developer category fails the linking.
Thanks,
Yu-Sheng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180928/c188efae/attachment-0001.html>
More information about the petsc-users
mailing list