[petsc-dev] Parmetis bug

Fande Kong fdkong.jd at gmail.com
Mon Nov 11 01:05:45 CST 2019


Please see PR https://gitlab.com/petsc/petsc/merge_requests/2294/diffs

This may be a PETSc bug.

Fande,

On Sun, Nov 10, 2019 at 7:28 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Nice. Anyway to add this exact reproducibility into a PETSc example that
> runs daily?
>
>   The truism: all codes are buggy, even those that haven't been touched in
> 15 years, is definitely represented here.
>
>    Barry
>
>
> > On Nov 10, 2019, at 7:31 PM, Fande Kong via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
> >
> > Valgrind info:
> >
> > ==32155== Invalid read of size 4
> > ==32155==    at 0x885F62F: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:879)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885CA49: libmetis__Match_RM (coarsen.c:262)
> > ==32155==    by 0x885BFBD: libmetis__CoarsenGraph (coarsen.c:55)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==  Address 0xee2502c is 0 bytes after a block of size 284 alloc'd
> > ==32155==    at 0x4C2AB80: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==32155==    by 0x8820179: gk_malloc (memory.c:147)
> > ==32155==    by 0x883D92B: libmetis__imalloc (gklib.c:24)
> > ==32155==    by 0x885BF6A: libmetis__CoarsenGraph (coarsen.c:46)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==
> > ==32155== Conditional jump or move depends on uninitialised value(s)
> > ==32155==    at 0x885F651: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:880)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885CA49: libmetis__Match_RM (coarsen.c:262)
> > ==32155==    by 0x885BFBD: libmetis__CoarsenGraph (coarsen.c:55)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==
> > ==32155== Use of uninitialised value of size 8
> > ==32155==    at 0x885F6F2: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:886)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885CA49: libmetis__Match_RM (coarsen.c:262)
> > ==32155==    by 0x885BFBD: libmetis__CoarsenGraph (coarsen.c:55)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==
> > ==32155== Use of uninitialised value of size 8
> > ==32155==    at 0x885F710: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:886)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885CA49: libmetis__Match_RM (coarsen.c:262)
> > ==32155==    by 0x885BFBD: libmetis__CoarsenGraph (coarsen.c:55)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==
> > ==32155== Invalid read of size 4
> > ==32155==    at 0x885F392: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:856)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885CA49: libmetis__Match_RM (coarsen.c:262)
> > ==32155==    by 0x885BFBD: libmetis__CoarsenGraph (coarsen.c:55)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==  Address 0xee2502c is 0 bytes after a block of size 284 alloc'd
> > ==32155==    at 0x4C2AB80: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==32155==    by 0x8820179: gk_malloc (memory.c:147)
> > ==32155==    by 0x883D92B: libmetis__imalloc (gklib.c:24)
> > ==32155==    by 0x885BF6A: libmetis__CoarsenGraph (coarsen.c:46)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==
> > ==32155== Invalid read of size 4
> > ==32155==    at 0x885F62F: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:879)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885D3CC: libmetis__Match_SHEM (coarsen.c:403)
> > ==32155==    by 0x885BFD2: libmetis__CoarsenGraph (coarsen.c:57)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==  Address 0xee27524 is 8 bytes after a block of size 76 alloc'd
> > ==32155==    at 0x4C2AB80: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==32155==    by 0x8820179: gk_malloc (memory.c:147)
> > ==32155==    by 0x883D92B: libmetis__imalloc (gklib.c:24)
> > ==32155==    by 0x88607CF: libmetis__SetupCoarseGraph (coarsen.c:1107)
> > ==32155==    by 0x885F16A: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:826)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885D3CC: libmetis__Match_SHEM (coarsen.c:403)
> > ==32155==    by 0x885BFD2: libmetis__CoarsenGraph (coarsen.c:57)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==
> > ==32155== Invalid read of size 4
> > ==32155==    at 0x885F392: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:856)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885D3CC: libmetis__Match_SHEM (coarsen.c:403)
> > ==32155==    by 0x885BFD2: libmetis__CoarsenGraph (coarsen.c:57)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==    by 0x6387033: PCGAMGCreateLevel_GAMG (gamg.c:226)
> > ==32155==    by 0x638BCA5: PCSetUp_GAMG (gamg.c:593)
> > ==32155==    by 0x64746B1: PCSetUp (precon.c:894)
> > ==32155==    by 0x65D388A: KSPSetUp (itfunc.c:377)
> > ==32155==  Address 0xee274cc is 4 bytes before a block of size 76 alloc'd
> > ==32155==    at 0x4C2AB80: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==32155==    by 0x8820179: gk_malloc (memory.c:147)
> > ==32155==    by 0x883D92B: libmetis__imalloc (gklib.c:24)
> > ==32155==    by 0x88607CF: libmetis__SetupCoarseGraph (coarsen.c:1107)
> > ==32155==    by 0x885F16A: libmetis__CreateCoarseGraphNoMask
> (coarsen.c:826)
> > ==32155==    by 0x885E2B9: libmetis__CreateCoarseGraph (coarsen.c:636)
> > ==32155==    by 0x885D3CC: libmetis__Match_SHEM (coarsen.c:403)
> > ==32155==    by 0x885BFD2: libmetis__CoarsenGraph (coarsen.c:57)
> > ==32155==    by 0x8869A4C: libmetis__MultilevelBisect (pmetis.c:240)
> > ==32155==    by 0x88696F6: libmetis__MlevelRecursiveBisection
> (pmetis.c:183)
> > ==32155==    by 0x88698D7: libmetis__MlevelRecursiveBisection
> (pmetis.c:207)
> > ==32155==    by 0x886951E: METIS_PartGraphRecursive (pmetis.c:133)
> > ==32155==    by 0x885650B: libmetis__InitKWayPartitioning (kmetis.c:194)
> > ==32155==    by 0x8856147: libmetis__MlevelKWayPartitioning
> (kmetis.c:121)
> > ==32155==    by 0x8855FD3: METIS_PartGraphKway (kmetis.c:71)
> > ==32155==    by 0x85E7611: libparmetis__PartitionSmallGraph (weird.c:478)
> > ==32155==    by 0x85D00F3: ParMETIS_V3_PartKway (kmetis.c:91)
> > ==32155==    by 0x5515C19: MatPartitioningApply_Parmetis_Private
> (pmetis.c:147)
> > ==32155==    by 0x5516F7D: MatPartitioningApply_Parmetis (pmetis.c:221)
> > ==32155==    by 0x550DD7F: MatPartitioningApply (partition.c:332)
> > ==32155==
> >
> >
> > On Sun, Nov 10, 2019 at 2:24 PM Mark Adams <mfadams at lbl.gov> wrote:
> > Fande, It looks to me like this branch in ParMetis must be taken to
> trigger this error. First Match_SHEM and then CreateCoarseGraphNoMask.
> >
> >    /* determine which matching scheme you will use */
> >     switch (ctrl->ctype) {
> >       case METIS_CTYPE_RM:
> >         Match_RM(ctrl, graph);
> >         break;
> >       case METIS_CTYPE_SHEM:
> >         if (eqewgts || graph->nedges == 0)
> >           Match_RM(ctrl, graph);
> >         else
> >           Match_SHEM(ctrl, graph);
> >         break;
> >       default:
> >         gk_errexit(SIGERR, "Unknown ctype: %d\n", ctrl->ctype);
> >     }
> >
> > -----------------------
> >
> >   /* Check if the mask-version of the code is a good choice */
> >   mask = HTLENGTH;
> >   if (cnvtxs < 2*mask || graph->nedges/graph->nvtxs > mask/20) {
> >     CreateCoarseGraphNoMask(ctrl, graph, cnvtxs, match);
> >     return;
> >   }
> >
> > ------------
> >
> > The actual error is in CreateCoarseGraphNoMask, graph->cmap is too small
> and this gets garbage. parmetis coarsen.c:856:
> >
> >     istart = xadj[v];
> >     iend   = xadj[v+1];
> >     for (j=istart; j<iend; j++) {
> >       k = cmap[adjncy[j]];
> >       if ((m = htable[k]) == -1) {
> >         cadjncy[nedges] = k;
> >
> >  Anyway, this is what I have so far,
> > Mark
> >
> >
> > On Sun, Nov 10, 2019 at 10:34 AM Mark Adams <mfadams at lbl.gov> wrote:
> > Fande, the problem is k below seems to index beyond the end of htable,
> resulting in a crazy m and a segv on the last line below.
> >
> > I don't have a clean valgrind machine now, that is what is needed if no
> one has seen anything like this. I could add a test in a MR and get the
> pipeline to do it.
> >
> > void CreateCoarseGraphNoMask(ctrl_t *ctrl, graph_t *graph, idx_t cnvtxs,
> >          idx_t *match)
> > {
> >   idx_t j, k, m, istart, iend, nvtxs, nedges, ncon, cnedges, v, u,
> dovsize;
> >   idx_t *xadj, *vwgt, *vsize, *adjncy, *adjwgt;
> >   idx_t *cmap, *htable;
> >   idx_t *cxadj, *cvwgt, *cvsize, *cadjncy, *cadjwgt;
> >   graph_t *cgraph;
> > ine
> >   WCOREPUSH;
> >
> >   dovsize = (ctrl->objtype == METIS_OBJTYPE_VOL ? 1 : 0);
> >
> >   IFSET(ctrl->dbglvl, METIS_DBG_TIME,
> gk_startcputimer(ctrl->ContractTmr));
> >
> >   nvtxs   = graph->nvtxs;
> >   ncon    = graph->ncon;
> >   xadj    = graph->xadj;
> >   vwgt    = graph->vwgt;
> >   vsize   = graph->vsize;
> >   adjncy  = graph->adjncy;
> >   adjwgt  = graph->adjwgt;
> >   cmap    = graph->cmap;
> >
> >
> >   /* Initialize the coarser graph */
> >   cgraph = SetupCoarseGraph(graph, cnvtxs, dovsize);
> >   cxadj    = cgraph->xadj;
> >   cvwgt    = cgraph->vwgt;
> >   cvsize   = cgraph->vsize;
> >   cadjncy  = cgraph->adjncy;
> >   cadjwgt  = cgraph->adjwgt;
> >
> >   htable = iset(cnvtxs, -1, iwspacemalloc(ctrl, cnvtxs));
> >
> >   cxadj[0] = cnvtxs = cnedges = 0;
> >   for (v=0; v<nvtxs; v++) {
> >     if ((u = match[v]) < v)
> >       continue;
> >
> >     ASSERT(cmap[v] == cnvtxs);
> >     ASSERT(cmap[match[v]] == cnvtxs);
> >
> >     if (ncon == 1)
> >       cvwgt[cnvtxs] = vwgt[v];
> >     else
> >       icopy(ncon, vwgt+v*ncon, cvwgt+cnvtxs*ncon);
> >
> >     if (dovsize)
> >       cvsize[cnvtxs] = vsize[v];
> >
> >     nedges = 0;
> >
> >     istart = xadj[v];
> >     iend   = xadj[v+1];
> >     for (j=istart; j<iend; j++) {
> >       k = cmap[adjncy[j]];
> >       if ((m = htable[k]) == -1) {
> >         cadjncy[nedges] = k;
> >         cadjwgt[nedges] = adjwgt[j];
> >         htable[k] = nedges++;
> >       }
> >       else {
> >         cadjwgt[m] += adjwgt[j];
> >
> > On Sun, Nov 10, 2019 at 1:35 AM Mark Adams <mfadams at lbl.gov> wrote:
> >
> >
> > On Sat, Nov 9, 2019 at 10:51 PM Fande Kong <fdkong.jd at gmail.com> wrote:
> > Hi Mark,
> >
> > Thanks for reporting this bug. I was surprised because we have
> sufficient heavy tests in moose using partition weights and do not have any
> issue so far.
> >
> >
> > I have been pounding on this code with elasticity and have not seen this
> issue. I am now looking at Lapacianas and I only see it with pretty large
> problems. The example below is pretty minimal (eg, it works with 16 cores
> and it works with -dm_refine 4). I have reproduced this on Cori, SUMMIT and
> my laptop.
> >
> > I will take a shot on this.
> >
> > Thanks, I'll try to take a look at it also. I have seen it in DDT, but
> did not dig further. It looked like a typical segv in ParMetis.
> >
> >
> > Fande,
> >
> > On Sat, Nov 9, 2019 at 3:08 PM Mark Adams <mfadams at lbl.gov> wrote:
> > snes/ex13 is getting a ParMetis segv with GAMG and coarse grid
> repartitioning. Below shows the branch and how to run it.
> >
> > I've tried valgrind on Cori but it gives a lot of false positives. I've
> seen this error in DDT but I have not had a chance to dig and try to fix
> it. At least I know it has something to do with weights.
> >
> > If anyone wants to take a shot at it feel free. This bug rarely happens.
> >
> > The changes use weights and are just a few lines of code (from 1.5 years
> ago):
> >
> > 12:08 (0455fb9fec...)|BISECTING ~/Codes/petsc$ git bisect bad
> > 0455fb9fecf69cf5cf35948c84d3837e5a427e2e is the first bad commit
> > commit 0455fb9fecf69cf5cf35948c84d3837e5a427e2e
> > Author: Fande Kong <fdkong.jd at gmail.com>
> > Date:   Thu Jun 21 18:21:19 2018 -0600
> >
> >     Let parmetis and ptsotch take edge weights and vertex weights
> >
> >  src/mat/partition/impls/pmetis/pmetis.c | 7 +++++++
> >  src/mat/partition/impls/scotch/scotch.c | 6 +++---
> >  2 files changed, 10 insertions(+), 3 deletions(-)
> >
> > > mpiexec -n 32 ./ex13 -cells 2,4,4, -dm_refine 5 -simplex 0 -dim 3
> -potential_petscspace_degree 1 -potential_petscspace_order 1 -pc_type gamg
> -petscpartitioner_type simple -pc_gamg_repartition true
> -check_pointer_intensity 0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20191111/772b9fbb/attachment-0001.html>


More information about the petsc-dev mailing list