[petsc-users] Fwd: SuperLU MPI-problem

Hong hzhang at mcs.anl.gov
Wed Jul 29 19:58:16 CDT 2015


Mahir,

Sherry fixed several bugs in superlu_dist-v4.1.
The current petsc-release interfaces with superlu_dist-v4.0.
We do not know whether the reported issue (attached below) has been
resolved or not. If not, can you test it with the latest superlu_dist-v4.1?

Here is how to do it:
1. download superlu_dist v4.1
2. remove existing PETSC_ARCH directory, then configure petsc with
'--download-superlu_dist=superlu_dist_4.1.tar.gz'
3. build petsc

Let us know if the issue remains.

Hong


---------- Forwarded message ----------
From: Xiaoye S. Li <xsli at lbl.gov>
Date: Wed, Jul 29, 2015 at 2:24 PM
Subject: Fwd: [petsc-users] SuperLU MPI-problem
To: Hong Zhang <hzhang at mcs.anl.gov>


Hong,
I am cleaning the mailbox, and saw this unresolved issue.  I am not sure
whether the new fix to parallel symbolic factorization solves the problem.
What bothers be is that he is getting the following error:

Invalid ISPEC at line 484 in file get_perm_c.c

This has nothing to do with my bug fix.
​  Shall we ask him to try the new version, or try to get him matrix?

Sherry
​

---------- Forwarded message ----------
From: Mahir.Ulker-Kaustell at tyrens.se <Mahir.Ulker-Kaustell at tyrens.se>
Date: Wed, Jul 22, 2015 at 1:32 PM
Subject: RE: [petsc-users] SuperLU MPI-problem
To: Hong <hzhang at mcs.anl.gov>, "Xiaoye S. Li" <xsli at lbl.gov>
Cc: petsc-users <petsc-users at mcs.anl.gov>


 The 1000 was just a conservative guess. The number of non-zeros per row is
in the tens in general but certain constraints lead to non-diagonal streaks
in the sparsity-pattern.

Is it the reordering of the matrix that is killing me here? How can I set
options.ColPerm?



If i use -mat_superlu_dist_parsymbfact the program crashes with



Invalid ISPEC at line 484 in file get_perm_c.c

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

-------------------------------------------------------

[0]PETSC ERROR:
------------------------------------------------------------------------

[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end

[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger

[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors

[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
run

[0]PETSC ERROR: to get more information on the crash.

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[0]PETSC ERROR: Signal received

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015

[0]PETSC ERROR: ./solve on a cygwin-complex-nodebug named CZC5202SM2 by muk
Wed Jul 22 21:59:23 2015

[0]PETSC ERROR: Configure options PETSC_DIR=/packages/petsc-3.6.0
PETSC_ARCH=cygwin-complex-nodebug --with-cc=gcc --with-cxx=g++
--with-fc=gfortran --with-debugging=0 --with-fortran-kernels=1
--with-scalar-type=complex --download-fblaspack --download-mpich
--download-scalapack --download-mumps --download-metis --download-parmetis
--download-superlu --download-superlu_dist --download-fftw

[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file

application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

[unset]: aborting job:

application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

[0]PETSC ERROR:
------------------------------------------------------------------------



If i use -mat_superlu_dist_parsymbfact=1 the program crashes (somewhat
later) with



Malloc fails for Lnzval_bc_ptr[*][] at line 626 in file zdistribute.c

col block 3006 -------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

-------------------------------------------------------

col block 1924 [0]PETSC ERROR:
------------------------------------------------------------------------

[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end

[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger

[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors

[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
run

[0]PETSC ERROR: to get more information on the crash.

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[0]PETSC ERROR: Signal received

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015

[0]PETSC ERROR: ./solve on a cygwin-complex-nodebug named CZC5202SM2 by muk
Wed Jul 22 21:59:58 2015

[0]PETSC ERROR: Configure options PETSC_DIR=/packages/petsc-3.6.0
PETSC_ARCH=cygwin-complex-nodebug --with-cc=gcc --with-cxx=g++
--with-fc=gfortran --with-debugging=0 --with-fortran-kernels=1
--with-scalar-type=complex --download-fblaspack --download-mpich
--download-scalapack --download-mumps --download-metis --download-parmetis
--download-superlu --download-superlu_dist --download-fftw

[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file

application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

[unset]: aborting job:

application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

[0]PETSC ERROR:
------------------------------------------------------------------------





/Mahir





*From:* Hong [mailto:hzhang at mcs.anl.gov]
*Sent:* den 22 juli 2015 21:34
*To:* Xiaoye S. Li
*Cc:* Ülker-Kaustell, Mahir; petsc-users


*Subject:* Re: [petsc-users] SuperLU MPI-problem



In Petsc/superlu_dist interface, we set default



options.ParSymbFact = NO;



When user raises the flag "-mat_superlu_dist_parsymbfact",

we set



    options.ParSymbFact = YES;

    options.ColPerm     = PARMETIS;   /* in v2.2, PARMETIS is forced for
ParSymbFact regardless of user ordering setting */



We do not change anything else.



Hong



On Wed, Jul 22, 2015 at 2:19 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:

I am trying to understand your problem. You said you are solving Naviers
equation (elastodynamics) in the frequency domain, using finite element
discretization.  I wonder why you have about 1000 nonzeros per row.
Usually in many PDE discretized matrices, the number of nonzeros per row is
in the tens (even for 3D problems), not in the thousands.   So, your matrix
is quite a bit denser than many sparse matrices we deal with.



The number of nonzeros in the L and U factors is much more than that in
original matrix A -- typically we see 10-20x fill ratio for 2D, or can be
as bad as 50-100x fill ratio for 3D.  But since your matrix starts much
denser (i.e., the underlying graph has many connections), it may not lend
to any good ordering strategy to preserve sparsity of L and U; that is, the
L and U fill ratio may be large.



I don't understand why you get the following error when you use

‘-mat_superlu_dist_parsymbfact’.



Invalid ISPEC at line 484 in file get_perm_c.c



Perhaps Hong Zhang knows; she built the SuperLU_DIST interface for PETSc.



​Hong -- in order to use parallel symbolic factorization, is it sufficient
to specify only

‘-mat_superlu_dist_parsymbfact’

​ ?  (the default is to use  sequential symbolic factorization.)





Sherry



On Wed, Jul 22, 2015 at 9:11 AM, Mahir.Ulker-Kaustell at tyrens.se <
Mahir.Ulker-Kaustell at tyrens.se> wrote:

Thank you for your reply.

As you have probably figured out already, I am not a computational
scientist. I am a researcher in civil engineering (railways for high-speed
traffic), trying to produce some, from my perspective, fairly large
parametric studies based on finite element discretizations.

I am working in a Windows-environment and have installed PETSc through
Cygwin.
Apparently, there is no support for Valgrind in this OS.

If I have understood you correct, the memory issues are related to superLU
and given my background, there is not much I can do. Is this correct?


Best regards,
Mahir

______________________________________________
Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, Tyréns
AB
010 452 30 82, Mahir.Ulker-Kaustell at tyrens.se
______________________________________________


-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov]
Sent: den 22 juli 2015 02:57
To: Ülker-Kaustell, Mahir
Cc: Xiaoye S. Li; petsc-users
Subject: Re: [petsc-users] SuperLU MPI-problem


   Run the program under valgrind
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind . When I use
the option -mat_superlu_dist_parsymbfact I get many scary memory problems
some involving for example ddist_psymbtonum (pdsymbfact_distdata.c:1332)

   Note that I consider it unacceptable for running programs to EVER use
uninitialized values; until these are all cleaned up I won't trust any runs
like this.

  Barry




==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x10274C436: MPI_Allgatherv (allgatherv.c:1053)
==42050==    by 0x101557F60: get_perm_c_parmetis (get_perm_c_parmetis.c:285)
==42050==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96)
==42050==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102851C61: MPIR_Allgatherv_intra (allgatherv.c:651)
==42050==    by 0x102853EC7: MPIR_Allgatherv (allgatherv.c:903)
==42050==    by 0x102853F84: MPIR_Allgatherv_impl (allgatherv.c:944)
==42050==    by 0x10274CA41: MPI_Allgatherv (allgatherv.c:1107)
==42050==    by 0x101557F60: get_perm_c_parmetis (get_perm_c_parmetis.c:285)
==42050==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96)
==42050==
==42049== Syscall param writev(vector[...]) points to uninitialised byte(s)
==42049==    at 0x102DA1C3A: writev (in
/usr/lib/system/libsystem_kernel.dylib)
==42049==    by 0x10296A0DC: MPL_large_writev (mplsock.c:32)
==42049==    by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610)
==42049==    by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84)
==42049==    by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556)
==42049==    by 0x102939531: MPID_Isend (mpid_isend.c:138)
==42049==    by 0x10277656E: MPI_Isend (isend.c:125)
==42049==    by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63)
==42049==    by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298)
==42049==    by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553)
==42049==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42049==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42049==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42049==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42049==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048== Syscall param writev(vector[...]) points to uninitialised byte(s)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==  Address 0x105edff70 is 1,424 bytes inside a block of size
752,720 alloc'd
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42049==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42049==    by 0x1020EAA28: gk_mcoreCreate (mcore.c:28)
==42048==    at 0x102DA1C3A: writev (in
/usr/lib/system/libsystem_kernel.dylib)
==42048==    by 0x10296A0DC: MPL_large_writev (mplsock.c:32)
==42049==    by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23)
==42049==    by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98)
==42048==    by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610)
==42048==    by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84)
==42048==    by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556)
==42049==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42049==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42049==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x102939531: MPID_Isend (mpid_isend.c:138)
==42048==    by 0x10277656E: MPI_Isend (isend.c:125)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63)
==42048==    by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553)
==42048==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42048==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==  Uninitialised value was created by a heap allocation
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42049==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42048==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42048==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10211C50B: libmetis__imalloc (gklib.c:24)
==42049==    by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519)
==42049==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42049==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42049==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42049==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==  Address 0x10597a860 is 1,408 bytes inside a block of size
752,720 alloc'd
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42048==    by 0x1020EAA28: gk_mcoreCreate (mcore.c:28)
==42048==    by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23)
==42048==    by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98)
==42048==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42048==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Uninitialised value was created by a heap allocation
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x1020EB90C: gk_malloc (memory.c:147)
==42048==    by 0x10211C50B: libmetis__imalloc (gklib.c:24)
==42048==    by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519)
==42048==    by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225)
==42048==    by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151)
==42048==    by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34)
==42048==    by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==
==42048== Syscall param write(buf) points to uninitialised byte(s)
==42048==    at 0x102DA1C22: write (in
/usr/lib/system/libsystem_kernel.dylib)
==42048==    by 0x10295F5BD: MPIDU_Sock_write (sock_immed.i:525)
==42048==    by 0x102944839: MPIDI_CH3_iStartMsg (ch3_istartmsg.c:86)
==42048==    by 0x102933B80: MPIDI_CH3_EagerContigShortSend
(ch3u_eager.c:257)
==42048==    by 0x10293ADBA: MPID_Send (mpid_send.c:130)
==42048==    by 0x10277A1FA: MPI_Send (send.c:127)
==42048==    by 0x10155802F: get_perm_c_parmetis (get_perm_c_parmetis.c:299)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Address 0x104810704 is on thread 1's stack
==42048==  in frame #3, created by MPIDI_CH3_EagerContigShortSend
(ch3u_eager.c:218)
==42048==  Uninitialised value was created by a heap allocation
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42048==    by 0x101557AB9: get_perm_c_parmetis (get_perm_c_parmetis.c:185)
==42048==    by 0x101501192: pdgssvx (pdgssvx.c:934)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102744CB8: MPI_Alltoallv (alltoallv.c:480)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102744E43: MPI_Alltoallv (alltoallv.c:490)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x102744EBF: MPI_Alltoallv (alltoallv.c:497)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x1027450B1: MPI_Alltoallv (alltoallv.c:512)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x10283FB06: MPIR_Alltoallv_intra (alltoallv.c:92)
==42050==    by 0x1028407B6: MPIR_Alltoallv (alltoallv.c:343)
==42050==    by 0x102840884: MPIR_Alltoallv_impl (alltoallv.c:380)
==42050==    by 0x10274541B: MPI_Alltoallv (alltoallv.c:531)
==42050==    by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539)
==42050==    by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a stack allocation
==42050==    at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96)
==42050==
==42050== Syscall param writev(vector[...]) points to uninitialised byte(s)
==42050==    at 0x102DA1C3A: writev (in
/usr/lib/system/libsystem_kernel.dylib)
==42050==    by 0x10296A0DC: MPL_large_writev (mplsock.c:32)
==42050==    by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610)
==42050==    by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84)
==42050==    by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556)
==42050==    by 0x102939531: MPID_Isend (mpid_isend.c:138)
==42050==    by 0x10277656E: MPI_Isend (isend.c:125)
==42050==    by 0x101524C41: pdgstrf2_trsm (pdgstrf2.c:201)
==42050==    by 0x10151ECBF: pdgstrf (pdgstrf.c:1082)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Address 0x1060144d0 is 1,168 bytes inside a block of size
131,072 alloc'd
==42050==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42050==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42050==    by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145)
==42050==    by 0x10151DA7D: pdgstrf (pdgstrf.c:735)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a heap allocation
==42050==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42050==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42050==    by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145)
==42050==    by 0x10151DA7D: pdgstrf (pdgstrf.c:735)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==
==42048== Conditional jump or move depends on uninitialised value(s)
==42048==    at 0x10151F141: pdgstrf (pdgstrf.c:1139)
==42048==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Uninitialised value was created by a heap allocation
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42048==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42048==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==
==42049== Conditional jump or move depends on uninitialised value(s)
==42049==    at 0x10151F141: pdgstrf (pdgstrf.c:1139)
==42049==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==  Uninitialised value was created by a heap allocation
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42049==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42049==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42049==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==
==42048== Conditional jump or move depends on uninitialised value(s)
==42048==    at 0x101520054: pdgstrf (pdgstrf.c:1429)
==42048==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049== Conditional jump or move depends on uninitialised value(s)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==  Uninitialised value was created by a heap allocation
==42049==    at 0x101520054: pdgstrf (pdgstrf.c:1429)
==42048==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42049==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42048==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42048==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42048==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42048==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42048==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42048==    by 0x100FF9036: PCSetUp (precon.c:982)
==42048==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42048==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42048==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42048==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==  Uninitialised value was created by a heap allocation
==42049==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42048==    by 0x100001B3C: main (in ./ex19)
==42048==
==42049==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42049==    by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332)
==42049==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42049==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42049==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42049==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42049==    by 0x100FF9036: PCSetUp (precon.c:982)
==42049==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42049==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42049==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42049==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42049==    by 0x100001B3C: main (in ./ex19)
==42049==
==42050== Conditional jump or move depends on uninitialised value(s)
==42050==    at 0x10151FDE6: pdgstrf (pdgstrf.c:1382)
==42050==    by 0x1015019A5: pdgssvx (pdgssvx.c:1069)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==  Uninitialised value was created by a heap allocation
==42050==    at 0x1000183B1: malloc (vg_replace_malloc.c:303)
==42050==    by 0x10153B704: superlu_malloc_dist (memory.c:108)
==42050==    by 0x10150B241: ddist_psymbtonum (pdsymbfact_distdata.c:1389)
==42050==    by 0x1015018C2: pdgssvx (pdgssvx.c:1057)
==42050==    by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:414)
==42050==    by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946)
==42050==    by 0x100F09F2C: PCSetUp_LU (lu.c:152)
==42050==    by 0x100FF9036: PCSetUp (precon.c:982)
==42050==    by 0x1010F54EB: KSPSetUp (itfunc.c:332)
==42050==    by 0x1010F7985: KSPSolve (itfunc.c:546)
==42050==    by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233)
==42050==    by 0x1011C49B7: SNESSolve (snes.c:3906)
==42050==    by 0x100001B3C: main (in ./ex19)
==42050==


> On Jul 20, 2015, at 12:03 PM, Mahir.Ulker-Kaustell at tyrens.se wrote:
>
> Ok. So I have been creating the full factorization on each process. That
gives me some hope!
>
> I followed your suggestion and tried to use the runtime option
‘-mat_superlu_dist_parsymbfact’.
> However, now the program crashes with:
>
> Invalid ISPEC at line 484 in file get_perm_c.c
>
> And so on…
>
> From the SuperLU manual; I should give the option either YES or NO,
however -mat_superlu_dist_parsymbfact YES makes the program crash in the
same way as above.
> Also I can’t find any reference to -mat_superlu_dist_parsymbfact in the
PETSc documentation
>
> Mahir
>
> Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr,
Tyréns AB
> 010 452 30 82, Mahir.Ulker-Kaustell at tyrens.se
>
> From: Xiaoye S. Li [mailto:xsli at lbl.gov]
> Sent: den 20 juli 2015 18:12
> To: Ülker-Kaustell, Mahir
> Cc: Hong; petsc-users
> Subject: Re: [petsc-users] SuperLU MPI-problem
>
> The default SuperLU_DIST setting is to serial symbolic factorization.
Therefore, what matters is how much memory do you have per MPI task?
>
> The code failed to malloc memory during redistribution of matrix A to
{L\U} data struction (using result of serial symbolic factorization.)
>
> You can use parallel symbolic factorization, by runtime option:
'-mat_superlu_dist_parsymbfact'
>
> Sherry Li
>
>
> On Mon, Jul 20, 2015 at 8:59 AM, Mahir.Ulker-Kaustell at tyrens.se <
Mahir.Ulker-Kaustell at tyrens.se> wrote:
> Hong:
>
> Previous experiences with this equation have shown that it is very
difficult to solve it iteratively. Hence the use of a direct solver.
>
> The large test problem I am trying to solve has slightly less than 10^6
degrees of freedom. The matrices are derived from finite elements so they
are sparse.
> The machine I am working on has 128GB ram. I have estimated the memory
needed to less than 20GB, so if the solver needs twice or even three times
as much, it should still work well. Or have I completely misunderstood
something here?
>
> Mahir
>
>
>
> From: Hong [mailto:hzhang at mcs.anl.gov]
> Sent: den 20 juli 2015 17:39
> To: Ülker-Kaustell, Mahir
> Cc: petsc-users
> Subject: Re: [petsc-users] SuperLU MPI-problem
>
> Mahir:
> Direct solvers consume large amount of memory. Suggest to try followings:
>
> 1. A sparse iterative solver if  [-omega^2M + K] is not too
ill-conditioned. You may test it using the small matrix.
>
> 2. Incrementally increase your matrix sizes. Try different matrix
orderings.
> Do you get memory crash in the 1st symbolic factorization?
> In your case, matrix data structure stays same when omega changes, so you
only need to do one matrix symbolic factorization and reuse it.
>
> 3. Use a machine that gives larger memory.
>
> Hong
>
> Dear Petsc-Users,
>
> I am trying to use PETSc to solve a set of linear equations arising from
Naviers equation (elastodynamics) in the frequency domain.
> The frequency dependency of the problem requires that the system
>
>                              [-omega^2M + K]u = F
>
> where M and K are constant, square, positive definite matrices (mass and
stiffness respectively) is solved for each frequency omega of interest.
> K is a complex matrix, including material damping.
>
> I have written a PETSc program which solves this problem for a small
(1000 degrees of freedom) test problem on one or several processors, but it
keeps crashing when I try it on my full scale (in the order of 10^6 degrees
of freedom) problem.
>
> The program crashes at KSPSetUp() and from what I can see in the error
messages, it appears as if it consumes too much memory.
>
> I would guess that similar problems have occurred in this mail-list, so I
am hoping that someone can push  me in the right direction…
>
> Mahir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150729/978d5207/attachment-0001.html>


More information about the petsc-users mailing list