[petsc-users] Can't expand MemType 1: jcol 16104
Satish Balay
balay at mcs.anl.gov
Tue Jul 28 11:11:58 CDT 2015
Sherry,
One minor issue with the tarball. I see the following new files in the v4.1 tarball
[when comparing it with v4.0]. Some of these files are perhaps junk files - and can
be removed from the tarball?
EXAMPLE/dscatter.c.bak
EXAMPLE/g10.cua
EXAMPLE/g4.cua
EXAMPLE/g4.postorder.eps
EXAMPLE/g4.rua
EXAMPLE/g4_postorder.jpg
EXAMPLE/hostname
EXAMPLE/pdgssvx.c
EXAMPLE/pdgstrf2.c
EXAMPLE/pwd
EXAMPLE/pzgstrf2.c
EXAMPLE/pzgstrf_v3.3.c
EXAMPLE/pzutil.c
EXAMPLE/test.bat
EXAMPLE/test.cpu.bat
EXAMPLE/test.err
EXAMPLE/test.err.1
EXAMPLE/zlook_ahead_update.c
FORTRAN/make.out
FORTRAN/zcreate_dist_matrix.c
MAKE_INC/make.xc30
SRC/int_t
SRC/lnbrow
SRC/make.out
SRC/rnbrow
SRC/temp
SRC/temp1
Thanks,
Satish
On Tue, 28 Jul 2015, Xiaoye S. Li wrote:
> I am checking v4.1 now. I'll let you know when I fixed the problem.
>
> Sherry
>
> On Tue, Jul 28, 2015 at 8:27 AM, Hong <hzhang at mcs.anl.gov> wrote:
>
> > Sherry,
> > I tested with superlu_dist v4.1. The extra printings are gone, but hang
> > remains.
> > It hangs at
> >
> > #5 0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0,
> > status=0x7fff9cd83d60)
> > at src/mpi/pt2pt/wait.c:168
> > #6 0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, n=4900,
> > anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8,
> > stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308
> >
> > if (recv_req[0] != MPI_REQUEST_NULL) {
> > --> MPI_Wait (&recv_req[0], &status);
> >
> > We will update petsc interface to superlu_dist v4.1.
> >
> > Hong
> >
> >
> > On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
> >
> >> Hong,
> >> Thanks for trying out.
> >> The extra printings are not properly guarded by the print level. I will
> >> fix that. I will look into the hang problem soon.
> >>
> >> Sherry
> >>
> >>
> >> On Mon, Jul 27, 2015 at 7:50 PM, Hong <hzhang at mcs.anl.gov> wrote:
> >>
> >>> Sherry,
> >>>
> >>> I can repeat hang using petsc/src/ksp/ksp/examples/tutorials/ex10.c:
> >>> mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0 -pc_type
> >>> lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact
> >>> ...
> >>> .. Starting with 1 OpenMP threads
> >>> [0] .. BIG U size 1342464
> >>> [0] .. BIG V size 131072
> >>> Max row size is 1311
> >>> Using buffer_size of 5000000
> >>> Threads per process 1
> >>> ...
> >>>
> >>> using a debugger (with petsc option '-start_in_debugger'), I find that
> >>> hang occurs at
> >>> #0 0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4,
> >>> timeout=<optimized out>, timeout at entry=-1)
> >>> at ../sysdeps/unix/sysv/linux/poll.c:83
> >>> #1 0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550,
> >>> millisecond_timeout=millisecond_timeout at entry=-1,
> >>> eventp=eventp at entry=0x7fff654930b0)
> >>> at src/mpid/common/sock/poll/sock_wait.i:123
> >>> #2 0x00007f117de898b8 in MPIDI_CH3i_Progress_wait (
> >>> progress_state=0x7fff65493120)
> >>> at src/mpid/ch3/channels/sock/src/ch3_progress.c:218
> >>> #3 MPIDI_CH3I_Progress (blocking=blocking at entry=1,
> >>> state=state at entry=0x7fff65493120)
> >>> at src/mpid/ch3/channels/sock/src/ch3_progress.c:921
> >>> #4 0x00007f117de1a559 in MPIR_Wait_impl (request=request at entry
> >>> =0x262df90,
> >>> status=status at entry=0x7fff65493390) at src/mpi/pt2pt/wait.c:67
> >>> #5 0x00007f117de1a818 in PMPI_Wait (request=0x262df90,
> >>> status=0x7fff65493390)
> >>> at src/mpi/pt2pt/wait.c:168
> >>> #6 0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900, n=4900,
> >>> anorm=13.738475134194639, LUstruct=0x23dfe78, grid=0x23dfd78,
> >>> stat=0x7fff65493ea0, info=0x7fff65493edc) at pzgstrf.c:1308
> >>>
> >>> #7 0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, A=0x23dfe30,
> >>> ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0, grid=0x23dfd78,
> >>> LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0,
> >>> stat=0x7fff65493ea0,
> >>> ---Type <return> to continue, or q <return> to quit---
> >>> info=0x7fff65493edc) at pzgssvx.c:1063
> >>>
> >>> #8 0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST (F=0x23a0110,
> >>> A=0x21bb7e0, info=0x2355068)
> >>> at
> >>> /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411
> >>> #9 0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110,
> >>> mat=0x21bb7e0,
> >>> info=0x2355068) at
> >>> /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946
> >>> #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10)
> >>> at /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152
> >>> #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10)
> >>> at /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983
> >>> #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0)
> >>> at /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332
> >>> #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578)
> >>> at /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312
> >>>
> >>> You may take a look at it. Sequential symbolic factorization works fine.
> >>>
> >>> Why superlu_dist (v4.0) in complex precision displays
> >>>
> >>> .. Starting with 1 OpenMP threads
> >>> [0] .. BIG U size 1342464
> >>> [0] .. BIG V size 131072
> >>> Max row size is 1311
> >>> Using buffer_size of 5000000
> >>> Threads per process 1
> >>> ...
> >>>
> >>> I realize that I use superlu_dist v4.0. Would v4.1 works? I'll give it a
> >>> try tomorrow.
> >>>
> >>> Hong
> >>>
> >>> On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas <
> >>> aph at email.arizona.edu> wrote:
> >>>
> >>>> Hi Hong,
> >>>>
> >>>> No that is not the correct matrix. Note that I forgot to mention that
> >>>> it is a complex matrix. I tried loading the matrix I sent you this morning
> >>>> with:
> >>>>
> >>>> !...Load a Matrix in Binary Format
> >>>> call
> >>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
> >>>> call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
> >>>> call MatSetType(DLOAD,MATAIJ,ierr)
> >>>> call MatLoad(DLOAD,viewer,ierr)
> >>>> call PetscViewerDestroy(viewer,ierr)
> >>>>
> >>>> call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr)
> >>>>
> >>>> The first 37 rows should look like this:
> >>>>
> >>>> Mat Object: 2 MPI processes
> >>>> type: mpiaij
> >>>> row 0: (0, 1)
> >>>> row 1: (1, 1)
> >>>> row 2: (2, 1)
> >>>> row 3: (3, 1)
> >>>> row 4: (4, 1)
> >>>> row 5: (5, 1)
> >>>> row 6: (6, 1)
> >>>> row 7: (7, 1)
> >>>> row 8: (8, 1)
> >>>> row 9: (9, 1)
> >>>> row 10: (10, 1)
> >>>> row 11: (11, 1)
> >>>> row 12: (12, 1)
> >>>> row 13: (13, 1)
> >>>> row 14: (14, 1)
> >>>> row 15: (15, 1)
> >>>> row 16: (16, 1)
> >>>> row 17: (17, 1)
> >>>> row 18: (18, 1)
> >>>> row 19: (19, 1)
> >>>> row 20: (20, 1)
> >>>> row 21: (21, 1)
> >>>> row 22: (22, 1)
> >>>> row 23: (23, 1)
> >>>> row 24: (24, 1)
> >>>> row 25: (25, 1)
> >>>> row 26: (26, 1)
> >>>> row 27: (27, 1)
> >>>> row 28: (28, 1)
> >>>> row 29: (29, 1)
> >>>> row 30: (30, 1)
> >>>> row 31: (31, 1)
> >>>> row 32: (32, 1)
> >>>> row 33: (33, 1)
> >>>> row 34: (34, 1)
> >>>> row 35: (35, 1)
> >>>> row 36: (1, -41.2444) (35, -41.2444) (36, 118.049 - 0.999271 i) (37,
> >>>> -21.447) (38, 5.18873) (39, -2.34856) (40, 1.3607) (41, -0.898206)
> >>>> (42, 0.642715) (43, -0.48593) (44, 0.382471) (45, -0.310476) (46,
> >>>> 0.258302) (47, -0.219268) (48, 0.189304) (49, -0.165815) (50,
> >>>> 0.147076) (51, -0.131907) (52, 0.119478) (53, -0.109189) (54, 0.1006)
> >>>> (55, -0.0933795) (56, 0.0872779) (57, -0.0821019) (58, 0.0777011) (59,
> >>>> -0.0739575) (60, 0.0707775) (61, -0.0680868) (62, 0.0658258) (63,
> >>>> -0.0639473) (64, 0.0624137) (65, -0.0611954) (66, 0.0602698) (67,
> >>>> -0.0596202) (68, 0.0592349) (69, -0.0295536) (71, -21.447) (106,
> >>>> 5.18873) (141, -2.34856) (176, 1.3607) (211, -0.898206) (246,
> >>>> 0.642715) (281, -0.48593) (316, 0.382471) (351, -0.310476) (386,
> >>>> 0.258302) (421, -0.219268) (456, 0.189304) (491, -0.165815) (526,
> >>>> 0.147076) (561, -0.131907) (596, 0.119478) (631, -0.109189) (666,
> >>>> 0.1006) (701, -0.0933795) (736, 0.0872779) (771, -0.0821019) (806,
> >>>> 0.0777011) (841, -0.0739575) (876, 0.0707775) (911, -0.0680868) (946,
> >>>> 0.0658258) (981, -0.0639473) (1016, 0.0624137) (1051, -0.0611954)
> >>>> (1086, 0.0602698) (1121, -0.0596202) (1156, 0.0592349) (1191,
> >>>> -0.0295536) (1261, 0) (3676, 117.211) (3711, -58.4801) (3746,
> >>>> -78.3633) (3781, 29.4911) (3816, -15.8073) (3851, 9.94324) (3886,
> >>>> -6.87205) (3921, 5.05774) (3956, -3.89521) (3991, 3.10522) (4026,
> >>>> -2.54388) (4061, 2.13082) (4096, -1.8182) (4131, 1.57606) (4166,
> >>>> -1.38491) (4201, 1.23155) (4236, -1.10685) (4271, 1.00428) (4306,
> >>>> -0.919116) (4341, 0.847829) (4376, -0.787776) (4411, 0.736933) (4446,
> >>>> -0.693735) (4481, 0.656958) (4516, -0.625638) (4551, 0.599007) (4586,
> >>>> -0.576454) (4621, 0.557491) (4656, -0.541726) (4691, 0.528849) (4726,
> >>>> -0.518617) (4761, 0.51084) (4796, -0.50538) (4831, 0.502142) (4866,
> >>>> -0.250534)
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Anthony
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 24, 2015 at 7:56 PM, Hong <hzhang at mcs.anl.gov> wrote:
> >>>>
> >>>>> Anthony:
> >>>>> I test your Amat_binary.m
> >>>>> using petsc/src/ksp/ksp/examples/tutorials/ex10.c.
> >>>>> Your matrix has many zero rows:
> >>>>> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more
> >>>>> Mat Object: 1 MPI processes
> >>>>> type: seqaij
> >>>>> row 0: (0, 1)
> >>>>> row 1: (1, 0)
> >>>>> row 2: (2, 1)
> >>>>> row 3: (3, 0)
> >>>>> row 4: (4, 1)
> >>>>> row 5: (5, 0)
> >>>>> row 6: (6, 1)
> >>>>> row 7: (7, 0)
> >>>>> row 8: (8, 1)
> >>>>> row 9: (9, 0)
> >>>>> ...
> >>>>> row 36: (1, 1) (35, 0) (36, 1) (37, 0) (38, 1) (39, 0) (40, 1)
> >>>>> (41, 0) (42, 1) (43, 0) (44, 1) (45,
> >>>>> 0) (46, 1) (47, 0) (48, 1) (49, 0) (50, 1) (51, 0) (52, 1)
> >>>>> (53, 0) (54, 1) (55, 0) (56, 1) (57, 0)
> >>>>> (58, 1) (59, 0) (60, 1) ...
> >>>>>
> >>>>> Do you send us correct matrix?
> >>>>>
> >>>>>>
> >>>>>> I ran my code through valgrind and gdb as suggested by Barry. I am
> >>>>>> now coming back to some problem I have had while running with parallel
> >>>>>> symbolic factorization. I am attaching a test matrix (petsc binary format)
> >>>>>> that I LU decompose and then use to solve a linear system (see code below).
> >>>>>> I can run on 2 processors with parsymbfact or with 4 processors without
> >>>>>> parsymbfact. However, if I run on 4 procs with parsymbfact, the code is
> >>>>>> just hanging. Below is the simplified test case that I have used to test.
> >>>>>> The matrix A and B are built somewhere else in my program. The matrix I am
> >>>>>> attaching is A-sigma*B (see below).
> >>>>>>
> >>>>>> One thing is that I don't know for sparse matrices what is the
> >>>>>> optimum number of processors to use for a LU decomposition? Does it depend
> >>>>>> on the total number of nonzero? Do you have an easy way to compute it?
> >>>>>>
> >>>>>
> >>>>> You have to experiment your matrix on a target machine to find out.
> >>>>>
> >>>>> Hong
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Subroutine HowBigLUCanBe(rank)
> >>>>>>
> >>>>>> IMPLICIT NONE
> >>>>>>
> >>>>>> integer(i4b),intent(in) :: rank
> >>>>>> integer(i4b) :: i,ct
> >>>>>> real(dp) :: begin,endd
> >>>>>> complex(dpc) :: sigma
> >>>>>>
> >>>>>> PetscErrorCode ierr
> >>>>>>
> >>>>>>
> >>>>>> if (rank==0) call cpu_time(begin)
> >>>>>>
> >>>>>> if (rank==0) then
> >>>>>> write(*,*)
> >>>>>> write(*,*)'Testing How Big LU Can Be...'
> >>>>>> write(*,*)'============================'
> >>>>>> write(*,*)
> >>>>>> endif
> >>>>>>
> >>>>>> sigma = (1.0d0,0.0d0)
> >>>>>> call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on
> >>>>>> exit A = A-sigma*B
> >>>>>>
> >>>>>> !.....Write Matrix to ASCII and Binary Format
> >>>>>> !call
> >>>>>> PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
> >>>>>> !call MatView(DXX,viewer,ierr)
> >>>>>> !call PetscViewerDestroy(viewer,ierr)
> >>>>>>
> >>>>>> call
> >>>>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
> >>>>>> call MatView(A,viewer,ierr)
> >>>>>> call PetscViewerDestroy(viewer,ierr)
> >>>>>>
> >>>>>> !.....Create Linear Solver Context
> >>>>>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
> >>>>>>
> >>>>>> !.....Set operators. Here the matrix that defines the linear system
> >>>>>> also serves as the preconditioning matrix.
> >>>>>> !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
> >>>>>> !aha commented and replaced by next line
> >>>>>> call KSPSetOperators(ksp,A,A,ierr) ! remember: here A =
> >>>>>> A-sigma*B
> >>>>>>
> >>>>>> !.....Set Relative and Absolute Tolerances and Uses Default for
> >>>>>> Divergence Tol
> >>>>>> tol = 1.e-10
> >>>>>> call
> >>>>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
> >>>>>>
> >>>>>> !.....Set the Direct (LU) Solver
> >>>>>> call KSPSetType(ksp,KSPPREONLY,ierr)
> >>>>>> call KSPGetPC(ksp,pc,ierr)
> >>>>>> call PCSetType(pc,PCLU,ierr)
> >>>>>> call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr)
> >>>>>> ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
> >>>>>>
> >>>>>> !.....Create Right-Hand-Side Vector
> >>>>>> call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
> >>>>>> call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
> >>>>>>
> >>>>>> allocate(xwork1(IendA-IstartA))
> >>>>>> allocate(loc(IendA-IstartA))
> >>>>>>
> >>>>>> ct=0
> >>>>>> do i=IstartA,IendA-1
> >>>>>> ct=ct+1
> >>>>>> loc(ct)=i
> >>>>>> xwork1(ct)=(1.0d0,0.0d0)
> >>>>>> enddo
> >>>>>>
> >>>>>> call
> >>>>>> VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
> >>>>>> call VecZeroEntries(sol,ierr)
> >>>>>>
> >>>>>> deallocate(xwork1,loc)
> >>>>>>
> >>>>>> !.....Assemble Vectors
> >>>>>> call VecAssemblyBegin(frhs,ierr)
> >>>>>> call VecAssemblyEnd(frhs,ierr)
> >>>>>>
> >>>>>> !.....Solve the Linear System
> >>>>>> call KSPSolve(ksp,frhs,sol,ierr)
> >>>>>>
> >>>>>> !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
> >>>>>>
> >>>>>> if (rank==0) then
> >>>>>> call cpu_time(endd)
> >>>>>> write(*,*)
> >>>>>> print '("Total time for HowBigLUCanBe = ",f21.3,"
> >>>>>> seconds.")',endd-begin
> >>>>>> endif
> >>>>>>
> >>>>>> call SlepcFinalize(ierr)
> >>>>>>
> >>>>>> STOP
> >>>>>>
> >>>>>>
> >>>>>> end Subroutine HowBigLUCanBe
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
> >>>>>>
> >>>>>> Indeed, the parallel symbolic factorization routine needs power of
> >>>>>> 2 processes, however, you can use however many processes you need;
> >>>>>> internally, we redistribute matrix to nearest power of 2 processes, do
> >>>>>> symbolic, then redistribute back to all the processes to do factorization,
> >>>>>> triangular solve etc. So, there is no restriction from the users
> >>>>>> viewpoint.
> >>>>>>
> >>>>>> It's difficult to tell what the problem is. Do you think you can
> >>>>>> print your matrix, then, I can do some debugging by running superlu_dist
> >>>>>> standalone?
> >>>>>>
> >>>>>> Sherry
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <
> >>>>>> aph at email.arizona.edu> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs
> >>>>>>> script. However, although my program worked fine with sequential symbolic
> >>>>>>> factorization, I get one of the following 2 behaviors when I run with
> >>>>>>> parallel symbolic factorization (depending on the number of processors that
> >>>>>>> I use):
> >>>>>>>
> >>>>>>> 1) the program just hangs (it seems stuck in some subroutine ==>
> >>>>>>> see test.out-hangs)
> >>>>>>> 2) I get a floating point exception ==> see
> >>>>>>> test.out-floating-point-exception
> >>>>>>>
> >>>>>>> Note that as suggested in the Superlu manual, I use a power of 2
> >>>>>>> number of procs. Are there any tunable parameters for the parallel symbolic
> >>>>>>> factorization? Note that when I build my sparse matrix, most elements I add
> >>>>>>> are nonzero of course but to simplify the programming, I also add a few
> >>>>>>> zero elements in the sparse matrix. I was thinking that maybe if the
> >>>>>>> parallel symbolic factorization proceed by block, there could be some
> >>>>>>> blocks where the pivot would be zero, hence creating the FPE??
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Anthony
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov> wrote:
> >>>>>>>
> >>>>>>>> Did you find out how to change option to use parallel symbolic
> >>>>>>>> factorization? Perhaps PETSc team can help.
> >>>>>>>>
> >>>>>>>> Sherry
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
> >>>>>>>>
> >>>>>>>>> Is there an inquiry function that tells you all the available
> >>>>>>>>> options?
> >>>>>>>>>
> >>>>>>>>> Sherry
> >>>>>>>>>
> >>>>>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <
> >>>>>>>>> aph at email.arizona.edu> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Sherry,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for your message. I have used superlu_dist default
> >>>>>>>>>> options. I did not realize that I was doing serial symbolic factorization.
> >>>>>>>>>> That is probably the cause of my problem.
> >>>>>>>>>> Each node on Garnet has 60GB usable memory and I can run with
> >>>>>>>>>> 1,2,4,8,16 or 32 core per node.
> >>>>>>>>>>
> >>>>>>>>>> So I should use:
> >>>>>>>>>>
> >>>>>>>>>> -mat_superlu_dist_r 20
> >>>>>>>>>> -mat_superlu_dist_c 32
> >>>>>>>>>>
> >>>>>>>>>> How do you specify the parallel symbolic factorization option?
> >>>>>>>>>> is it -mat_superlu_dist_matinput 1
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> Anthony
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> For superlu_dist failure, this occurs during symbolic
> >>>>>>>>>>> factorization. Since you are using serial symbolic factorization, it
> >>>>>>>>>>> requires the entire graph of A to be available in the memory of one MPI
> >>>>>>>>>>> task. How much memory do you have for each MPI task?
> >>>>>>>>>>>
> >>>>>>>>>>> It won't help even if you use more processes. You should try
> >>>>>>>>>>> to use parallel symbolic factorization option.
> >>>>>>>>>>>
> >>>>>>>>>>> Another point. You set up process grid as:
> >>>>>>>>>>> Process grid nprow 32 x npcol 20
> >>>>>>>>>>> For better performance, you show swap the grid dimension. That
> >>>>>>>>>>> is, it's better to use 20 x 32, never gives nprow larger than npcol.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Sherry
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <bsmith at mcs.anl.gov>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I would suggest running a sequence of problems, 101 by 101
> >>>>>>>>>>>> 111 by 111 etc and get the memory usage in each case (when you run out of
> >>>>>>>>>>>> memory you can get NO useful information out about memory needs). You can
> >>>>>>>>>>>> then plot memory usage as a function of problem size to get a handle on how
> >>>>>>>>>>>> much memory it is using. You can also run on more and more processes
> >>>>>>>>>>>> (which have a total of more memory) to see how large a problem you may be
> >>>>>>>>>>>> able to reach.
> >>>>>>>>>>>>
> >>>>>>>>>>>> MUMPS also has an "out of core" version (which we have never
> >>>>>>>>>>>> used) that could in theory anyways let you get to large problems if you
> >>>>>>>>>>>> have lots of disk space, but you are on your own figuring out how to use it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Barry
> >>>>>>>>>>>>
> >>>>>>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <
> >>>>>>>>>>>> aph at email.arizona.edu> wrote:
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Hi Jose,
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > In my code, I use once PETSc to solve a linear system to get
> >>>>>>>>>>>> the baseflow (without using SLEPc) and then I use SLEPc to do the stability
> >>>>>>>>>>>> analysis of that baseflow. This is why, there are some SLEPc options that
> >>>>>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am solving for the
> >>>>>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case for which I get
> >>>>>>>>>>>> the eigenvalues. That case works fine. However If i increase to 151x151, I
> >>>>>>>>>>>> get the error that you can see in test.out-superlu_dist-151x151 (similar
> >>>>>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the
> >>>>>>>>>>>> very end of the files test.out-superlu_dist-151x151 and
> >>>>>>>>>>>> test.out-mumps-151x151, you will see that the last info message printed is:
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > On Processor (after EPSSetFromOptions) 0 memory:
> >>>>>>>>>>>> 0.65073152000E+08 =====> (see line 807 of module_petsc.F90)
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > This means that the memory error probably occurs in the call
> >>>>>>>>>>>> to EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how
> >>>>>>>>>>>> much memory is required by the most memory intensive operation within
> >>>>>>>>>>>> EPSSolve. Since I am solving a generalized EVP, I would imagine that it
> >>>>>>>>>>>> would be the LU decomposition. But is there an accurate way of doing it?
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Before starting with iterative solvers, I would like to
> >>>>>>>>>>>> exploit as much as I can direct solvers. I tried GMRES with default
> >>>>>>>>>>>> preconditioner at some point but I had convergence problem. What
> >>>>>>>>>>>> solver/preconditioner would you recommend for a generalized non-Hermitian
> >>>>>>>>>>>> (EPS_GNHEP) EVP?
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Thanks,
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Anthony
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <
> >>>>>>>>>>>> jroman at dsic.upv.es> wrote:
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > > Hi,
> >>>>>>>>>>>> > >
> >>>>>>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and
> >>>>>>>>>>>> superlu_dist for the LU decomposition (my problem is a generalized
> >>>>>>>>>>>> eigenvalue problem). The code runs fine for a grid with 101x101 but when I
> >>>>>>>>>>>> increase to 151x151, I get the following error:
> >>>>>>>>>>>> > >
> >>>>>>>>>>>> > > Can't expand MemType 1: jcol 16104 (and then [NID 00037]
> >>>>>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this process.)
> >>>>>>>>>>>> > >
> >>>>>>>>>>>> > > It seems to be a memory problem. I monitor the memory usage
> >>>>>>>>>>>> as far as I can and it seems that memory usage is pretty low. The most
> >>>>>>>>>>>> memory intensive part of the program is probably the LU decomposition in
> >>>>>>>>>>>> the context of the generalized EVP. Is there a way to evaluate how much
> >>>>>>>>>>>> memory will be required for that step? I am currently running the debug
> >>>>>>>>>>>> version of the code which I would assume would use more memory?
> >>>>>>>>>>>> > >
> >>>>>>>>>>>> > > I have attached the output of the job. Note that the
> >>>>>>>>>>>> program uses twice PETSc: 1) to solve a linear system for which no problem
> >>>>>>>>>>>> occurs, and, 2) to solve the Generalized EVP with SLEPc, where I get the
> >>>>>>>>>>>> error.
> >>>>>>>>>>>> > >
> >>>>>>>>>>>> > > Thanks
> >>>>>>>>>>>> > >
> >>>>>>>>>>>> > > Anthony
> >>>>>>>>>>>> > > <test.out-superlu_dist-151x151>
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > In the output you are attaching there are no SLEPc objects in
> >>>>>>>>>>>> the report and SLEPc options are not used. It seems that SLEPc calls are
> >>>>>>>>>>>> skipped?
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Do you get the same error with MUMPS? Have you tried to solve
> >>>>>>>>>>>> linear systems with a preconditioned iterative solver?
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Jose
> >>>>>>>>>>>> >
> >>>>>>>>>>>> >
> >>>>>>>>>>>> >
> >>>>>>>>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
More information about the petsc-users
mailing list