[petsc-users] MUMPS solver crash with petsc-3.2

Barry Smith bsmith at mcs.anl.gov
Fri Dec 2 09:53:45 CST 2011


   This is a problem in Metis/Parmetis.   PETSc 3.2 MUST be used with parmetis 3.2.0 

    Are you perhaps using Metis/Parmetis metis-5.0.2 and parmetis-4.0.2? Did you use --download-parmetis or install it yourself? You should use the --download to the the right one

   Barry


On Dec 2, 2011, at 3:47 AM, Gong Ding wrote:

> petsc-3.1-p8 & mumps 4.9 works well.
> However, petsc-3.2 p3-p5 & mumps 4.10 seems have memory problem.
> 
> The code occasionally crash on Linux/X64 and alwasy crash on AIX/PPC.
> This problem may caused by mumps4.10 or parmetis, but I havn't test.
> 
> The valgrind report:
> 
> ==5139== Invalid read of size 8
> ==5139==    at 0x57BE749: __intel_new_memcpy (in /opt/intel/Compiler/11.1/038/lib/intel64/libirc.so)
> ==5139==    by 0x57A0AF5: _intel_fast_memcpy.J (in /opt/intel/Compiler/11.1/038/lib/intel64/libirc.so)
> ==5139==    by 0x2269296: __GrowBisection (initpart.c:200)
> ==5139==    by 0x2269C5F: __Init2WayPartition (initpart.c:36)
> ==5139==    by 0x223DDFA: __MlevelNodeBisection (ometis.c:485)
> ==5139==    by 0x223DB9B: __MlevelNodeBisectionMultiple (ometis.c:405)
> ==5139==    by 0x223D957: __MlevelNestedDissection (ometis.c:289)
> ==5139==    by 0x223DA09: __MlevelNestedDissection (ometis.c:309)
> ==5139==    by 0x223E837: METIS_NodeND (ometis.c:157)
> ==5139==    by 0x2245E31: metis_nodend_ (frename.c:122)
> ==5139==    by 0x2190666: dmumps_195_ (dmumps_part2.F:1435)
> ==5139==    by 0x20C79D2: dmumps_26_ (dmumps_part5.F:313)
> ==5139==  Address 0x8718408 is 440 bytes inside a block of size 444 alloc'd
> ==5139==    at 0x4A0776F: malloc (vg_replace_malloc.c:263)
> ==5139==    by 0x226E6A2: __GKmalloc (util.c:111)
> ==5139==    by 0x226E6E4: __idxmalloc (util.c:60)
> ==5139==    by 0x2269014: __GrowBisection (initpart.c:101)
> ==5139==    by 0x2269C5F: __Init2WayPartition (initpart.c:36)
> ==5139==    by 0x223DDFA: __MlevelNodeBisection (ometis.c:485)
> ==5139==    by 0x223DB9B: __MlevelNodeBisectionMultiple (ometis.c:405)
> ==5139==    by 0x223D957: __MlevelNestedDissection (ometis.c:289)
> ==5139==    by 0x223DA09: __MlevelNestedDissection (ometis.c:309)
> ==5139==    by 0x223E837: METIS_NodeND (ometis.c:157)
> ==5139==    by 0x2245E31: metis_nodend_ (frename.c:122)
> ==5139==    by 0x2190666: dmumps_195_ (dmumps_part2.F:1435)
> ==5139==
> ==5139== Invalid read of size 8
> ==5139==    at 0x57BE609: __intel_new_memcpy (in /opt/intel/Compiler/11.1/038/lib/intel64/libirc.so)
> ==5139==    by 0x73F8CCCCC: ???
> ==5139==    by 0x300000001F: ???
> ==5139==    by 0x7FEFF8B4F: ???
> ==5139==    by 0x7FEFF8C37: ???
> ==5139==    by 0xE326EEF: ???
> ==5139==    by 0x216F: ???
> ==5139==    by 0x223DA09: __MlevelNestedDissection (ometis.c:309)
> ==5139==    by 0x223E837: METIS_NodeND (ometis.c:157)
> ==5139==    by 0x2245E31: metis_nodend_ (frename.c:122)
> ==5139==    by 0x2190666: dmumps_195_ (dmumps_part2.F:1435)
> ==5139==    by 0x20C79D2: dmumps_26_ (dmumps_part5.F:313)
> ==5139==  Address 0x9288b38 is 0 bytes after a block of size 15,224 alloc'd
> ==5139==    at 0x4A0776F: malloc (vg_replace_malloc.c:263)
> ==5139==    by 0x226E6A2: __GKmalloc (util.c:111)
> ==5139==    by 0x226E6E4: __idxmalloc (util.c:60)
> ==5139==    by 0x223DB60: __MlevelNodeBisectionMultiple (ometis.c:402)
> ==5139==    by 0x223D957: __MlevelNestedDissection (ometis.c:289)
> ==5139==    by 0x223DA09: __MlevelNestedDissection (ometis.c:309)
> ==5139==    by 0x223E837: METIS_NodeND (ometis.c:157)
> ==5139==    by 0x2245E31: metis_nodend_ (frename.c:122)
> ==5139==    by 0x2190666: dmumps_195_ (dmumps_part2.F:1435)
> ==5139==    by 0x20C79D2: dmumps_26_ (dmumps_part5.F:313)
> ==5139==    by 0x216A8DC: dmumps_ (dmumps_part1.F:409)
> ==5139==    by 0x209407F: dmumps_f77_ (dmumps_part3.F:6651)
> ==5139==
> ==5139== Invalid read of size 8
> ==5139==    at 0x57BE609: __intel_new_memcpy (in /opt/intel/Compiler/11.1/038/lib/intel64/libirc.so)
> ==5139==    by 0x57A0AF5: _intel_fast_memcpy.J (in /opt/intel/Compiler/11.1/038/lib/intel64/libirc.so)
> ==5139==    by 0x21913DD: dmumps_557_ (dmumps_part2.F:2147)
> ==5139==    by 0x218E243: dmumps_195_ (dmumps_part2.F:1535)
> ==5139==    by 0x20C79D2: dmumps_26_ (dmumps_part5.F:313)
> ==5139==    by 0x216A8DC: dmumps_ (dmumps_part1.F:409)
> ==5139==    by 0x209407F: dmumps_f77_ (dmumps_part3.F:6651)
> ==5139==    by 0x2073047: dmumps_c (mumps_c.c:422)
> ==5139==    by 0x19755BF: MatLUFactorSymbolic_AIJMUMPS (mumps.c:893)
> ==5139==    by 0x1831D89: MatLUFactorSymbolic (matrix.c:2823)
> ==5139==    by 0x1C98F64: PCSetUp_LU (lu.c:135)
> ==5139==    by 0x1F5B26B: PCSetUp (precon.c:819)
> ==5139==  Address 0xf24f6e8 is 3,163,816 bytes inside a block of size 3,163,820 alloc'd
> ==5139==    at 0x4A0776F: malloc (vg_replace_malloc.c:263)
> ==5139==    by 0x5CB3C43: for_allocate (in /opt/intel/Compiler/11.1/038/lib/intel64/libifcore.so.5)
> ==5139==    by 0x5CB3B50: for_alloc_allocatable (in /opt/intel/Compiler/11.1/038/lib/intel64/libifcore.so.5)
> ==5139==    by 0x218CD64: dmumps_195_ (dmumps_part2.F:1072)
> ==5139==    by 0x20C79D2: dmumps_26_ (dmumps_part5.F:313)
> ==5139==    by 0x216A8DC: dmumps_ (dmumps_part1.F:409)
> ==5139==    by 0x209407F: dmumps_f77_ (dmumps_part3.F:6651)
> ==5139==    by 0x2073047: dmumps_c (mumps_c.c:422)
> ==5139==    by 0x19755BF: MatLUFactorSymbolic_AIJMUMPS (mumps.c:893)
> ==5139==    by 0x1831D89: MatLUFactorSymbolic (matrix.c:2823)
> ==5139==    by 0x1C98F64: PCSetUp_LU (lu.c:135)
> ==5139==    by 0x1F5B26B: PCSetUp (precon.c:819)
> 



More information about the petsc-users mailing list