[petsc-users] SuperLU MPI-problem

Mahir.Ulker-Kaustell at tyrens.se Mahir.Ulker-Kaustell at tyrens.se
Wed Jul 22 11:11:59 CDT 2015


Thank you for your reply.

As you have probably figured out already, I am not a computational scientist. I am a researcher in civil engineering (railways for high-speed traffic), trying to produce some, from my perspective, fairly large parametric studies based on finite element discretizations. 

I am working in a Windows-environment and have installed PETSc through Cygwin.
Apparently, there is no support for Valgrind in this OS. 

If I have understood you correct, the memory issues are related to superLU and given my background, there is not much I can do. Is this correct?


Best regards,
Mahir

______________________________________________
Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, Tyréns AB
010 452 30 82, Mahir.Ulker-Kaustell at tyrens.se
______________________________________________

-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Sent: den 22 juli 2015 02:57
To: Ülker-Kaustell, Mahir
Cc: Xiaoye S. Li; petsc-users
Subject: Re: [petsc-users] SuperLU MPI-problem


   Run the program under valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind . When I use the option -mat_superlu_dist_parsymbfact I get many scary memory problems some involving for example ddist_psymbtonum (pdsymbfact_distdata.c:1332) 

   Note that I consider it unacceptable for running programs to EVER use uninitialized values; until these are all cleaned up I won't trust any runs like this. 

  Barry




==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x10274C436: MPI_Allgatherv (allgatherv.c:1053)
==42050==    by 0x101557F60: get_perm_c_parmetis (get_perm_c_parmetis.c:285)
==42050==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96)
==42050== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102851C61: MPIR_Allgatherv_intra (allgatherv.c:651)
==42050==    by 0x102853EC7: MPIR_Allgatherv (allgatherv.c:903)
==42050==    by 0x102853F84: MPIR_Allgatherv_impl (allgatherv.c:944)
==42050==    by 0x10274CA41: MPI_Allgatherv (allgatherv.c:1107)
==42050==    by 0x101557F60: get_perm_c_parmetis (get_perm_c_parmetis.c:285)
==42050==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96)
==42050== 
==42049== Syscall param writev(vector[...]) points to uninitialised byte(s)
==42049==    at 0x102DA1C3A: writev (in /usr/lib/system/libsystem_kernel.dylib)
==42049==    by 0x10296A0DC: MPL_large_writev (mplsock.c:32)
==42049==    by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610)
==42049==    by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84)
==42049==    by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556)
==42049==    by 0x102939531: MPID_Isend (mpid_isend.c:138)
==42049==    by 0x10277656E: MPI_Isend (isend.c:125)
==42049==    by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63)
==42049==    by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298)
==42049==    by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553)
==42049==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42049==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42049==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42049==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42049==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048== Syscall param writev(vector[...]) points to uninitialised byte(s)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==  Address 0x105edff70 is 1,424 bytes inside a block of size 752,720 alloc'd
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42049==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42049==    by 0x1020EAA28: gk_mcoreCreate (mcore.c:28)
==42048==    at 0x102DA1C3A: writev (in /usr/lib/system/libsystem_kernel.dylib)
==42048==    by 0x10296A0DC: MPL_large_writev (mplsock.c:32)
==42049==    by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23)
==42049==    by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98)
==42048==    by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610)
==42048==    by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84)
==42048==    by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556)
==42049==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42049==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42049==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x102939531: MPID_Isend (mpid_isend.c:138)
==42048==    by 0x10277656E: MPI_Isend (isend.c:125)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63)
==42048==    by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553)
==42048==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42048==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==  Uninitialised value was created by a heap allocation
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42049==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42048==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42048==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10211C50B: libmetis__imalloc (gklib.c:24)
==42049==    by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519)
==42049==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42049==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42049==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42049==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==  Address 0x10597a860 is 1,408 bytes inside a block of size 752,720 alloc'd
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049== 
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42048==    by 0x1020EAA28: gk_mcoreCreate (mcore.c:28)
==42048==    by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23)
==42048==    by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98)
==42048==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42048==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Uninitialised value was created by a heap allocation
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42048==    by 0x10211C50B: libmetis__imalloc (gklib.c:24)
==42048==    by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519)
==42048==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42048==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42048==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42048==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048== 
==42048== Syscall param write(buf) points to uninitialised byte(s)
==42048==    at 0x102DA1C22: write (in /usr/lib/system/libsystem_kernel.dylib)
==42048==    by 0x10295F5BD: MPIDU_Sock_write (sock_immed.i:525)
==42048==    by 0x102944839: MPIDI_CH3_iStartMsg (ch3_istartmsg.c:86)
==42048==    by 0x102933B80: MPIDI_CH3_EagerContigShortSend (ch3u_eager.c:257)
==42048==    by 0x10293ADBA: MPID_Send (mpid_send.c:130)
==42048==    by 0x10277A1FA: MPI_Send (send.c:127)
==42048==    by 0x10155802F: get_perm_c_parmetis (get_perm_c_parmetis.c:299)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Address 0x104810704 is on thread 1's stack
==42048==  in frame #3, created by MPIDI_CH3_EagerContigShortSend (ch3u_eager.c:218)
==42048==  Uninitialised value was created by a heap allocation
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42048==    by 0x101557AB9: get_perm_c_parmetis (get_perm_c_parmetis.c:185)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102744CB8: MPI_Alltoallv (alltoallv.c:480)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102744E43: MPI_Alltoallv (alltoallv.c:490)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102744EBF: MPI_Alltoallv (alltoallv.c:497)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x1027450B1: MPI_Alltoallv (alltoallv.c:512)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x10283FB06: MPIR_Alltoallv_intra (alltoallv.c:92)
==42050==    by 0x1028407B6: MPIR_Alltoallv (alltoallv.c:343)
==42050==    by 0x102840884: MPIR_Alltoallv_impl (alltoallv.c:380)
==42050==    by 0x10274541B: MPI_Alltoallv (alltoallv.c:531)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050== 
==42050== Syscall param writev(vector[...]) points to uninitialised byte(s)
==42050==    at 0x102DA1C3A: writev (in /usr/lib/system/libsystem_kernel.dylib)
==42050==    by 0x10296A0DC: MPL_large_writev (mplsock.c:32)
==42050==    by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610)
==42050==    by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84)
==42050==    by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556)
==42050==    by 0x102939531: MPID_Isend (mpid_isend.c:138)
==42050==    by 0x10277656E: MPI_Isend (isend.c:125)
==42050==    by 0x101524C41: pdgstrf2_trsm (pdgstrf2.c:201)
==42050==    by 0x10151ECBF: pdgstrf (pdgstrf.c:1082)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Address 0x1060144d0 is 1,168 bytes inside a block of size 131,072 alloc'd
==42050==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42050==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42050==    by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145)
==42050==    by 0x10151DA7D: pdgstrf (pdgstrf.c:735)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a heap allocation
==42050==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42050==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42050==    by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145)
==42050==    by 0x10151DA7D: pdgstrf (pdgstrf.c:735)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050== 
==42048== Conditional jump or move depends on uninitialised value(s)
==42048==    at 0x10151F141: pdgstrf (pdgstrf.c:1139)
==42048==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Uninitialised value was created by a heap allocation
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42048==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42048==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048== 
==42049== Conditional jump or move depends on uninitialised value(s)
==42049==    at 0x10151F141: pdgstrf (pdgstrf.c:1139)
==42049==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==  Uninitialised value was created by a heap allocation
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42049==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42049==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42049==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049== 
==42048== Conditional jump or move depends on uninitialised value(s)
==42048==    at 0x101520054: pdgstrf (pdgstrf.c:1429)
==42048==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049== Conditional jump or move depends on uninitialised value(s)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Uninitialised value was created by a heap allocation
==42049==    at 0x101520054: pdgstrf (pdgstrf.c:1429)
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42049==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42048==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42048==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==  Uninitialised value was created by a heap allocation
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048== 
==42049==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42049==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42049==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049== 
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x10151FDE6: pdgstrf (pdgstrf.c:1382)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a heap allocation
==42050==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42050==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42050==    by 0x10150B241: ddist_psymbtonum (pdsymbfact_distdata.c:1389)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050== 


> On Jul 20, 2015, at 12:03 PM, Mahir.Ulker-Kaustell at tyrens.se wrote:
> 
> Ok. So I have been creating the full factorization on each process. That gives me some hope!
>  
> I followed your suggestion and tried to use the runtime option ‘-mat_superlu_dist_parsymbfact’.
> However, now the program crashes with:
>  
> Invalid ISPEC at line 484 in file get_perm_c.c
>  
> And so on…
>  
> From the SuperLU manual; I should give the option either YES or NO, however -mat_superlu_dist_parsymbfact YES makes the program crash in the same way as above.
> Also I can’t find any reference to -mat_superlu_dist_parsymbfact in the PETSc documentation
>  
> Mahir
>  
> Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, Tyréns AB
> 010 452 30 82, Mahir.Ulker-Kaustell at tyrens.se
>  
> From: Xiaoye S. Li [mailto:xsli at lbl.gov] 
> Sent: den 20 juli 2015 18:12
> To: Ülker-Kaustell, Mahir
> Cc: Hong; petsc-users
> Subject: Re: [petsc-users] SuperLU MPI-problem
>  
> The default SuperLU_DIST setting is to serial symbolic factorization. Therefore, what matters is how much memory do you have per MPI task?
> 
> The code failed to malloc memory during redistribution of matrix A to {L\U} data struction (using result of serial symbolic factorization.)
>  
> You can use parallel symbolic factorization, by runtime option: '-mat_superlu_dist_parsymbfact'
> 
> Sherry Li
> 
>  
> On Mon, Jul 20, 2015 at 8:59 AM, Mahir.Ulker-Kaustell at tyrens.se <Mahir.Ulker-Kaustell at tyrens.se> wrote:
> Hong:
>  
> Previous experiences with this equation have shown that it is very difficult to solve it iteratively. Hence the use of a direct solver.
>  
> The large test problem I am trying to solve has slightly less than 10^6 degrees of freedom. The matrices are derived from finite elements so they are sparse.
> The machine I am working on has 128GB ram. I have estimated the memory needed to less than 20GB, so if the solver needs twice or even three times as much, it should still work well. Or have I completely misunderstood something here?
>  
> Mahir
>  
>  
>  
> From: Hong [mailto:hzhang at mcs.anl.gov] 
> Sent: den 20 juli 2015 17:39
> To: Ülker-Kaustell, Mahir
> Cc: petsc-users
> Subject: Re: [petsc-users] SuperLU MPI-problem
>  
> Mahir:
> Direct solvers consume large amount of memory. Suggest to try followings:
>  
> 1. A sparse iterative solver if  [-omega^2M + K] is not too ill-conditioned. You may test it using the small matrix.
>  
> 2. Incrementally increase your matrix sizes. Try different matrix orderings.
> Do you get memory crash in the 1st symbolic factorization? 
> In your case, matrix data structure stays same when omega changes, so you only need to do one matrix symbolic factorization and reuse it.
>  
> 3. Use a machine that gives larger memory.
>  
> Hong
>  
> Dear Petsc-Users,
>  
> I am trying to use PETSc to solve a set of linear equations arising from Naviers equation (elastodynamics) in the frequency domain.
> The frequency dependency of the problem requires that the system
>  
>                              [-omega^2M + K]u = F
>  
> where M and K are constant, square, positive definite matrices (mass and stiffness respectively) is solved for each frequency omega of interest.
> K is a complex matrix, including material damping.
>  
> I have written a PETSc program which solves this problem for a small (1000 degrees of freedom) test problem on one or several processors, but it keeps crashing when I try it on my full scale (in the order of 10^6 degrees of freedom) problem.
>  
> The program crashes at KSPSetUp() and from what I can see in the error messages, it appears as if it consumes too much memory.
>  
> I would guess that similar problems have occurred in this mail-list, so I am hoping that someone can push  me in the right direction…
>  
> Mahir




More information about the petsc-users mailing list