[petsc-users] DMPlexDistribute segmentation fault

Josh L ysjosh.lo at gmail.com
Sat Sep 29 19:47:01 CDT 2018


It is actually my mistake.
Although every rank should call CreateFromCellList, only 1 proc  needs to
read in the mesh, and other just create empty mesh.
It works fine now.


Thanks,
Josh


Matthew Knepley <knepley at gmail.com> 於 2018年9月29日 週六 上午11:00寫道:

> On Fri, Sep 28, 2018 at 9:57 PM Josh L <ysjosh.lo at gmail.com> wrote:
>
>> Hi,
>>
>> I am implementing DMPlex to handle my mesh in parallel, but I am getting
>> segmentation fault during DMPlexDistribute.
>> i am testing with 2 processors on a toy mesh that only has 4 quad
>> elements.
>>
>> the code is as following
>>       CALL DMPlexCreateFromCellList
>>       Check mesh topology. Face, skeleton symmetry
>>      CALL  DMPlexDistribute (dmserial,0,PETSC_NULL_SF,dmmpi,ierr) .
>> (segmentation fault on rank #1  here)
>>
>
> Crud, I was missing a check that the overlap is different on different
> procs. This is now fixed.
>
> Second it might be that your Fortran ints are not the same as PetscInt.
> Try declaring it
>
>   PetscInt overlap = 0
>
> and then passing 'overlap' instead.
>
>   Thanks,
>
>      Matt
>
>
>> I trace it back to the external library "Chaco" called by PETSc to
>> partition the mesh.
>>
>> The following is the stack
>>
>> For rank #1
>>
>> #10 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18
>> (at 0x000000000040569d)
>> #9 createmesh (rank=1, nsize=2, dmmpi=...) at
>> /work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at
>> 0x000000000040523b)
>> #8 dmplexdistribute_ (dm=0x320, overlap=0x1, sf=0x7fffffff4480,
>> dmParallel=0x2aaab78a585d, ierr=0x1) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15
>> (at 0x00002aaaac08334e)
>> #7 DMPlexDistribute (dm=0x320, overlap=1, sf=0x7fffffff4480,
>> dmParallel=0x2aaab78a585d) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664
>> (at 0x00002aaaabd12d08)
>> #6 PetscPartitionerPartition (part=0x320, dm=0x1,
>> partSection=0x7fffffff4480, partition=0x2aaab78a585d) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675
>> (at 0x00002aaaabcfe92c)
>> #5 PetscPartitionerPartition_Chaco (part=0x320, dm=0x1, nparts=-48000,
>> numVertices=-1215670179, start=0x1, adjacency=0x22, partSection=0x0,
>> partition=0x772120) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1279
>> (at 0x00002aaaabd03dfd)
>> #4 interface.Z (nvtxs=800, start=0x1, adjacency=0x7fffffff4480,
>> vwgts=0x2aaab78a585d, ewgts=0x1, x=0x22, y=0x0, z=0x0, outassignname=0x0,
>> outfilename=0x0, assignment=0x76ff90, architecture=1, ndims_tot=0,
>> mesh_dims=0x7fffffff7038, goal=0x0, global_method=1, local_method=1,
>> rqi_flag=0, vmax=200, ndims=1, eigtol=2.1137067449068142e-314,
>> seed=123636512) at
>> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/main/interface.c:206
>> (at 0x00002aaaad7aaf70)
>> #3 submain.Z (graph=0x320, nvtxs=1, nedges=-48000,
>> using_vwgts=-1215670179, using_ewgts=1, igeom=34, coords=0x0,
>> outassignname=0x0, outfilename=0x0, assignment=0x76ff8e, goal=0x7739a0,
>> architecture=1, ndims_tot=0, mesh_dims=0x7fffffff7038, global_method=1,
>> local_method=1, rqi_flag=0, vmax=200, ndims=1,
>> eigtol=2.1137067449068142e-314, seed=123636512) at
>> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/submain/submain.c:151
>> (at 0x00002aaaad7ae52e)
>> #2 check_input.Z (graph=0x320, nvtxs=1, nedges=-48000, igeom=-1215670179,
>> coords=0x1, graphname=0x22 <error: Cannot access memory at address 0x22>,
>> assignment=0x76ff8e, goal=0x7739a0, architecture=1, ndims_tot=0,
>> mesh_dims=0x7fffffff7038, global_method=1, local_method=1, rqi_flag=0,
>> vmax=0x7fffffff47a8, ndims=1, eigtol=2.1137067449068142e-314) at
>> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/input/check_input.c:56
>> (at 0x00002aaaad7c6ed1)
>> #1 check_graph (graph=0x320, nvtxs=1, nedges=-48000) at
>> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:90
>> (at 0x00002aaaad8204d3)
>> #0 is_an_edge (vertex=0x320, v2=1, weight2=0x7fffffff4480) at
>> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:134
>> (at 0x00002aaaad8206d3)
>>
>> For rank #0
>>
>> #18 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18
>> (at 0x000000000040569d)
>> #17 createmesh (rank=0, nsize=2, dmmpi=...) at
>> /work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at
>> 0x000000000040523b)
>> #16 dmplexdistribute_ (dm=0x65f300, overlap=0x0, sf=0x2aaab4c8698c,
>> dmParallel=0xffffffffffffffff, ierr=0x0) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15
>> (at 0x00002aaaac08334e)
>> #15 DMPlexDistribute (dm=0x65f300, overlap=0, sf=0x2aaab4c8698c,
>> dmParallel=0xffffffffffffffff) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664
>> (at 0x00002aaaabd12d08)
>> #14 PetscPartitionerPartition (part=0x65f300, dm=0x0,
>> partSection=0x2aaab4c8698c, partition=0xffffffffffffffff) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675
>> (at 0x00002aaaabcfe92c)
>> #13 PetscPartitionerPartition_Chaco (part=0x65f300, dm=0x0,
>> nparts=-1261934196, numVertices=-1, start=0x0, adjacency=0x0,
>> partSection=0x7726c0, partition=0x7fffffff71a8) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1314
>> (at 0x00002aaaabd04029)
>> #12 ISCreateGeneral (comm=6681344, n=0, idx=0x2aaab4c8698c,
>> mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292), is=0x0)
>> at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:671
>> (at 0x00002aaaab0ea94e)
>> #11 ISGeneralSetIndices (is=0x65f300, n=0, idx=0x2aaab4c8698c,
>> mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:698
>> (at 0x00002aaaab0eaa77)
>> #10 ISGeneralSetIndices_General.Z (is=0x65f300, n=0, idx=0x2aaab4c8698c,
>> mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:712
>> (at 0x00002aaaab0ef567)
>> #9 PetscLayoutSetUp (map=0x65f300) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/utils/pmap.c:137 (at
>> 0x00002aaaab0c6d52)
>> #8 PetscSplitOwnership (comm=6681344, n=0x0, N=0x2aaab4c8698c) at
>> /home1/apps/intel17/impi17_0/petsc/3.9/src/sys/utils/psplit.c:80 (at
>> 0x00002aaaaaeb7b00)
>> #7 PMPI_Allreduce (sendbuf=0x65f300, recvbuf=0x0, count=-1261934196,
>> datatype=-1, op=0, comm=0) at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1395
>> (at 0x00002aaab40966e6)
>> #6 MPIR_Allreduce_intra (sendbuf=0x65f300, recvbuf=0x0,
>> count=-1261934196, datatype=-1, op=0, comm_ptr=0x0, errflag=0x7fffffff4798)
>> at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:339
>> (at 0x00002aaab409307b)
>> #5 MPIR_Allreduce_shm_generic (sendbuf=<optimized out>,
>> recvbuf=<optimized out>, count=<optimized out>, datatype=<optimized out>,
>> op=<optimized out>, comm_ptr=<optimized out>, errflag=<optimized out>,
>> kind=1476395011) at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1137
>> (at 0x00002aaab409307b)
>> #4 I_MPI_COLL_SHM_KNARY_REDUCE (node_comm_ptr=<optimized out>,
>> root=<optimized out>, localbuf=<optimized out>, sendbuf=<optimized out>,
>> recvbuf=<optimized out>, count=<optimized out>, datatype=<optimized out>,
>> op=<optimized out>, errflag=<optimized out>, knomial_factor=<optimized
>> out>) at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:1090
>> (at 0x00002aaab409307b)
>> #3 I_MPI_COLL_SHM_GENERIC_GATHER_REDUCE..1 (node_comm_ptr=0x65f300,
>> root=0, is_reduce=-1261934196, localbuf=0xffffffffffffffff, sendbuf=0x0,
>> recvbuf=0x0, count=1, datatype=1275069445, op=1476395011,
>> errflag=0x7fffffff4798, knomial_factor=4, algo_type=2) at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:558
>> (at 0x00002aaab408f3a9)
>> #2 I_MPI_memcpy (destination=<optimized out>, source=0x0, size=<optimized
>> out>) at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:749
>> (at 0x00002aaab408f3a9)
>> #1 PMPIDI_CH3I_Progress (progress_state=0x65f300, is_blocking=0) at
>> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:981
>> (at 0x00002aaab40e85a6)
>> #0 sched_yield () from /lib64/libc.so.6 (at 0x00002aaab7898e47)
>>
>>
>> Any idea why this is happening? the overlap is set to 0, but it is 1 on
>> rank #1
>>
>>
>> Is there anyway to know the cell and vertex number distributed on each
>> processors?
>> my old code partitions the mesh with Metis, and I always output cell data
>> that shows the rank number on each cell, so I can visualize how mesh is
>> partitioned.
>> It is not necessary, but is there anyway to get it in DMPlex?
>> DMPlexGetCellNumbering might work, but it fails the linking. In fact,
>> many function under developer category fails the linking.
>>
>>
>> Thanks,
>> Yu-Sheng
>>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180929/be86a4de/attachment-0001.html>


More information about the petsc-users mailing list