[petsc-users] Can't expand MemType 1: jcol 16104

Wed Jul 29 11:23:00 CDT 2015

Thanks for quick update.  In the new tarball, I have already removed the
junk files, as pointed out by Satish.

Sherry

On Wed, Jul 29, 2015 at 8:36 AM, Hong <hzhang at mcs.anl.gov> wrote:

> Sherry,
> With your bugfix, superlu_dist-4.1 works now:
>
> petsc/src/ksp/ksp/examples/tutorials (master)
> $ mpiexec -n 4 ./ex10 -f0 Amat_binary.m -rhs 0 -pc_type lu
> -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact
> Number of iterations =   1
> Residual norm 2.11605e-11
>
> Once you address Satish's request, we'll update petsc interface to this
> version of superlu_dist.
>
> Anthony:
> Please download the latest superlu_dist-v4.1,
> then configure petsc with
> '--download-superlu_dist=superlu_dist_4.1.tar.gz'
>
> Hong
>
> On Tue, Jul 28, 2015 at 11:11 AM, Satish Balay <balay at mcs.anl.gov> wrote:
>
>> Sherry,
>>
>> One minor issue with the tarball. I see the following new files in the
>> v4.1 tarball
>> [when comparing it with v4.0]. Some of these files are perhaps junk files
>> - and can
>> be removed from the tarball?
>>
>>    EXAMPLE/dscatter.c.bak
>>    EXAMPLE/g10.cua
>>    EXAMPLE/g4.cua
>>    EXAMPLE/g4.postorder.eps
>>    EXAMPLE/g4.rua
>>    EXAMPLE/g4_postorder.jpg
>>    EXAMPLE/hostname
>>    EXAMPLE/pdgssvx.c
>>    EXAMPLE/pdgstrf2.c
>>    EXAMPLE/pwd
>>    EXAMPLE/pzgstrf2.c
>>    EXAMPLE/pzgstrf_v3.3.c
>>    EXAMPLE/pzutil.c
>>    EXAMPLE/test.bat
>>    EXAMPLE/test.cpu.bat
>>    EXAMPLE/test.err
>>    EXAMPLE/test.err.1
>>    EXAMPLE/zlook_ahead_update.c
>>    FORTRAN/make.out
>>    FORTRAN/zcreate_dist_matrix.c
>>    MAKE_INC/make.xc30
>>    SRC/int_t
>>    SRC/lnbrow
>>    SRC/make.out
>>    SRC/rnbrow
>>    SRC/temp
>>    SRC/temp1
>>
>>
>> Thanks,
>> Satish
>>
>>
>> On Tue, 28 Jul 2015, Xiaoye S. Li wrote:
>>
>> > I am checking v4.1 now. I'll let you know when I fixed the problem.
>> >
>> > Sherry
>> >
>> > On Tue, Jul 28, 2015 at 8:27 AM, Hong <hzhang at mcs.anl.gov> wrote:
>> >
>> > > Sherry,
>> > > I tested with superlu_dist v4.1. The extra printings are gone, but
>> hang
>> > > remains.
>> > > It hangs at
>> > >
>> > > #5  0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0,
>> > > status=0x7fff9cd83d60)
>> > >     at src/mpi/pt2pt/wait.c:168
>> > > #6  0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, n=4900,
>> > >     anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8,
>> > >     stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308
>> > >
>> > >                 if (recv_req[0] != MPI_REQUEST_NULL) {
>> > >  -->                   MPI_Wait (&recv_req[0], &status);
>> > >
>> > > We will update petsc interface to superlu_dist v4.1.
>> > >
>> > > Hong
>> > >
>> > >
>> > > On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>> > >
>> > >> Hong,
>> > >> Thanks for trying out.
>> > >> The extra printings are not properly guarded by the print level.  I
>> will
>> > >> fix that.   I will look into the hang problem soon.
>> > >>
>> > >> Sherry
>> > >> 
>> > >>
>> > >> On Mon, Jul 27, 2015 at 7:50 PM, Hong <hzhang at mcs.anl.gov> wrote:
>> > >>
>> > >>> Sherry,
>> > >>>
>> > >>> I can repeat hang using petsc/src/ksp/ksp/examples/tutorials/ex10.c:
>> > >>> mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0
>> -pc_type
>> > >>> lu -pc_factor_mat_solver_package superlu_dist
>> -mat_superlu_dist_parsymbfact
>> > >>> ...
>> > >>> .. Starting with 1 OpenMP threads
>> > >>> [0] .. BIG U size 1342464
>> > >>> [0] .. BIG V size 131072
>> > >>>   Max row size is 1311
>> > >>>   Using buffer_size of 5000000
>> > >>>   Threads per process 1
>> > >>> ...
>> > >>>
>> > >>> using a debugger (with petsc option '-start_in_debugger'), I find
>> that
>> > >>> hang occurs at
>> > >>> #0  0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4,
>> > >>>     timeout=<optimized out>, timeout at entry=-1)
>> > >>>     at ../sysdeps/unix/sysv/linux/poll.c:83
>> > >>> #1  0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550,
>> > >>>     millisecond_timeout=millisecond_timeout at entry=-1,
>> > >>>     eventp=eventp at entry=0x7fff654930b0)
>> > >>>     at src/mpid/common/sock/poll/sock_wait.i:123
>> > >>> #2  0x00007f117de898b8 in MPIDI_CH3i_Progress_wait (
>> > >>>     progress_state=0x7fff65493120)
>> > >>>     at src/mpid/ch3/channels/sock/src/ch3_progress.c:218
>> > >>> #3  MPIDI_CH3I_Progress (blocking=blocking at entry=1,
>> > >>>     state=state at entry=0x7fff65493120)
>> > >>>     at src/mpid/ch3/channels/sock/src/ch3_progress.c:921
>> > >>> #4  0x00007f117de1a559 in MPIR_Wait_impl (request=request at entry
>> > >>> =0x262df90,
>> > >>>     status=status at entry=0x7fff65493390) at src/mpi/pt2pt/wait.c:67
>> > >>> #5  0x00007f117de1a818 in PMPI_Wait (request=0x262df90,
>> > >>> status=0x7fff65493390)
>> > >>>     at src/mpi/pt2pt/wait.c:168
>> > >>> #6  0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900,
>> n=4900,
>> > >>>     anorm=13.738475134194639, LUstruct=0x23dfe78, grid=0x23dfd78,
>> > >>>     stat=0x7fff65493ea0, info=0x7fff65493edc) at pzgstrf.c:1308
>> > >>>
>> > >>> #7  0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, A=0x23dfe30,
>> > >>>     ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0,
>> grid=0x23dfd78,
>> > >>>     LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0,
>> > >>> stat=0x7fff65493ea0,
>> > >>> ---Type <return> to continue, or q <return> to quit---
>> > >>>     info=0x7fff65493edc) at pzgssvx.c:1063
>> > >>>
>> > >>> #8  0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST
>> (F=0x23a0110,
>> > >>>     A=0x21bb7e0, info=0x2355068)
>> > >>>     at
>> > >>>
>> /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411
>> > >>> #9  0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110,
>> > >>> mat=0x21bb7e0,
>> > >>>     info=0x2355068) at
>> > >>> /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946
>> > >>> #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10)
>> > >>>     at /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152
>> > >>> #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10)
>> > >>>     at /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983
>> > >>> #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0)
>> > >>>     at /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332
>> > >>> #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578)
>> > >>>     at
>> /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312
>> > >>>
>> > >>> You may take a look at it. Sequential symbolic factorization works
>> fine.
>> > >>>
>> > >>> Why superlu_dist (v4.0) in complex precision displays
>> > >>>
>> > >>> .. Starting with 1 OpenMP threads
>> > >>> [0] .. BIG U size 1342464
>> > >>> [0] .. BIG V size 131072
>> > >>>   Max row size is 1311
>> > >>>   Using buffer_size of 5000000
>> > >>>   Threads per process 1
>> > >>> ...
>> > >>>
>> > >>> I realize that I use superlu_dist v4.0. Would v4.1 works? I'll give
>> it a
>> > >>> try tomorrow.
>> > >>>
>> > >>> Hong
>> > >>>
>> > >>> On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas <
>> > >>> aph at email.arizona.edu> wrote:
>> > >>>
>> > >>>> Hi Hong,
>> > >>>>
>> > >>>> No that is not the correct matrix. Note that I forgot to mention
>> that
>> > >>>> it is a complex matrix. I tried loading the matrix I sent you this
>> morning
>> > >>>> with:
>> > >>>>
>> > >>>> !...Load a Matrix in Binary Format
>> > >>>>       call
>> > >>>>
>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
>> > >>>>       call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
>> > >>>>       call MatSetType(DLOAD,MATAIJ,ierr)
>> > >>>>       call MatLoad(DLOAD,viewer,ierr)
>> > >>>>       call PetscViewerDestroy(viewer,ierr)
>> > >>>>
>> > >>>>       call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> > >>>>
>> > >>>> The first 37 rows should look like this:
>> > >>>>
>> > >>>> Mat Object: 2 MPI processes
>> > >>>>   type: mpiaij
>> > >>>> row 0: (0, 1)
>> > >>>> row 1: (1, 1)
>> > >>>> row 2: (2, 1)
>> > >>>> row 3: (3, 1)
>> > >>>> row 4: (4, 1)
>> > >>>> row 5: (5, 1)
>> > >>>> row 6: (6, 1)
>> > >>>> row 7: (7, 1)
>> > >>>> row 8: (8, 1)
>> > >>>> row 9: (9, 1)
>> > >>>> row 10: (10, 1)
>> > >>>> row 11: (11, 1)
>> > >>>> row 12: (12, 1)
>> > >>>> row 13: (13, 1)
>> > >>>> row 14: (14, 1)
>> > >>>> row 15: (15, 1)
>> > >>>> row 16: (16, 1)
>> > >>>> row 17: (17, 1)
>> > >>>> row 18: (18, 1)
>> > >>>> row 19: (19, 1)
>> > >>>> row 20: (20, 1)
>> > >>>> row 21: (21, 1)
>> > >>>> row 22: (22, 1)
>> > >>>> row 23: (23, 1)
>> > >>>> row 24: (24, 1)
>> > >>>> row 25: (25, 1)
>> > >>>> row 26: (26, 1)
>> > >>>> row 27: (27, 1)
>> > >>>> row 28: (28, 1)
>> > >>>> row 29: (29, 1)
>> > >>>> row 30: (30, 1)
>> > >>>> row 31: (31, 1)
>> > >>>> row 32: (32, 1)
>> > >>>> row 33: (33, 1)
>> > >>>> row 34: (34, 1)
>> > >>>> row 35: (35, 1)
>> > >>>> row 36: (1, -41.2444)  (35, -41.2444)  (36, 118.049 - 0.999271 i)
>> (37,
>> > >>>> -21.447)  (38, 5.18873)  (39, -2.34856)  (40, 1.3607)  (41,
>> -0.898206)
>> > >>>> (42, 0.642715)  (43, -0.48593)  (44, 0.382471)  (45, -0.310476)
>> (46,
>> > >>>> 0.258302)  (47, -0.219268)  (48, 0.189304)  (49, -0.165815)  (50,
>> > >>>> 0.147076)  (51, -0.131907)  (52, 0.119478)  (53, -0.109189)  (54,
>> 0.1006)
>> > >>>> (55, -0.0933795)  (56, 0.0872779)  (57, -0.0821019)  (58,
>> 0.0777011)  (59,
>> > >>>> -0.0739575)  (60, 0.0707775)  (61, -0.0680868)  (62, 0.0658258)
>> (63,
>> > >>>> -0.0639473)  (64, 0.0624137)  (65, -0.0611954)  (66, 0.0602698)
>> (67,
>> > >>>> -0.0596202)  (68, 0.0592349)  (69, -0.0295536)  (71, -21.447)
>> (106,
>> > >>>> 5.18873)  (141, -2.34856)  (176, 1.3607)  (211, -0.898206)  (246,
>> > >>>> 0.642715)  (281, -0.48593)  (316, 0.382471)  (351, -0.310476)
>> (386,
>> > >>>> 0.258302)  (421, -0.219268)  (456, 0.189304)  (491, -0.165815)
>> (526,
>> > >>>> 0.147076)  (561, -0.131907)  (596, 0.119478)  (631, -0.109189)
>> (666,
>> > >>>> 0.1006)  (701, -0.0933795)  (736, 0.0872779)  (771, -0.0821019)
>> (806,
>> > >>>> 0.0777011)  (841, -0.0739575)  (876, 0.0707775)  (911,
>> -0.0680868)  (946,
>> > >>>> 0.0658258)  (981, -0.0639473)  (1016, 0.0624137)  (1051,
>> -0.0611954)
>> > >>>> (1086, 0.0602698)  (1121, -0.0596202)  (1156, 0.0592349)  (1191,
>> > >>>> -0.0295536)  (1261, 0)  (3676, 117.211)  (3711, -58.4801)  (3746,
>> > >>>> -78.3633)  (3781, 29.4911)  (3816, -15.8073)  (3851, 9.94324)
>> (3886,
>> > >>>> -6.87205)  (3921, 5.05774)  (3956, -3.89521)  (3991, 3.10522)
>> (4026,
>> > >>>> -2.54388)  (4061, 2.13082)  (4096, -1.8182)  (4131, 1.57606)
>> (4166,
>> > >>>> -1.38491)  (4201, 1.23155)  (4236, -1.10685)  (4271, 1.00428)
>> (4306,
>> > >>>> -0.919116)  (4341, 0.847829)  (4376, -0.787776)  (4411, 0.736933)
>> (4446,
>> > >>>> -0.693735)  (4481, 0.656958)  (4516, -0.625638)  (4551, 0.599007)
>> (4586,
>> > >>>> -0.576454)  (4621, 0.557491)  (4656, -0.541726)  (4691, 0.528849)
>> (4726,
>> > >>>> -0.518617)  (4761, 0.51084)  (4796, -0.50538)  (4831, 0.502142)
>> (4866,
>> > >>>> -0.250534)
>> > >>>>
>> > >>>>
>> > >>>> Thanks,
>> > >>>>
>> > >>>> Anthony
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On Fri, Jul 24, 2015 at 7:56 PM, Hong <hzhang at mcs.anl.gov> wrote:
>> > >>>>
>> > >>>>> Anthony:
>> > >>>>> I test your Amat_binary.m
>> > >>>>> using petsc/src/ksp/ksp/examples/tutorials/ex10.c.
>> > >>>>> Your matrix has many zero rows:
>> > >>>>> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more
>> > >>>>> Mat Object: 1 MPI processes
>> > >>>>>   type: seqaij
>> > >>>>> row 0: (0, 1)
>> > >>>>> row 1: (1, 0)
>> > >>>>> row 2: (2, 1)
>> > >>>>> row 3: (3, 0)
>> > >>>>> row 4: (4, 1)
>> > >>>>> row 5: (5, 0)
>> > >>>>> row 6: (6, 1)
>> > >>>>> row 7: (7, 0)
>> > >>>>> row 8: (8, 1)
>> > >>>>> row 9: (9, 0)
>> > >>>>> ...
>> > >>>>> row 36: (1, 1)  (35, 0)  (36, 1)  (37, 0)  (38, 1)  (39, 0)  (40,
>> 1)
>> > >>>>>  (41, 0)  (42, 1)  (43, 0)  (44, 1)  (45,
>> > >>>>> 0)  (46, 1)  (47, 0)  (48, 1)  (49, 0)  (50, 1)  (51, 0)  (52, 1)
>> > >>>>>  (53, 0)  (54, 1)  (55, 0)  (56, 1)  (57, 0)
>> > >>>>>  (58, 1)  (59, 0)  (60, 1)  ...
>> > >>>>>
>> > >>>>> Do you send us correct matrix?
>> > >>>>>
>> > >>>>>>
>> > >>>>>> I ran my code through valgrind and gdb as suggested by Barry. I
>> am
>> > >>>>>> now coming back to some problem I have had while running with
>> parallel
>> > >>>>>> symbolic factorization. I am attaching a test matrix (petsc
>> binary format)
>> > >>>>>> that I LU decompose and then use to solve a linear system (see
>> code below).
>> > >>>>>> I can run on 2 processors with parsymbfact or with 4 processors
>> without
>> > >>>>>> parsymbfact. However, if I run on 4 procs with parsymbfact, the
>> code is
>> > >>>>>> just hanging. Below is the simplified test case that I have used
>> to test.
>> > >>>>>> The matrix A and B are built somewhere else in my program. The
>> matrix I am
>> > >>>>>> attaching is A-sigma*B (see below).
>> > >>>>>>
>> > >>>>>> One thing is that I don't know for sparse matrices what is the
>> > >>>>>> optimum number of processors to use for a LU decomposition? Does
>> it depend
>> > >>>>>> on the total number of nonzero? Do you have an easy way to
>> compute it?
>> > >>>>>>
>> > >>>>>
>> > >>>>> You have to experiment your matrix on a target machine to find
>> out.
>> > >>>>>
>> > >>>>> Hong
>> > >>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>      Subroutine HowBigLUCanBe(rank)
>> > >>>>>>
>> > >>>>>>       IMPLICIT NONE
>> > >>>>>>
>> > >>>>>>       integer(i4b),intent(in) :: rank
>> > >>>>>>       integer(i4b)            :: i,ct
>> > >>>>>>       real(dp)                :: begin,endd
>> > >>>>>>       complex(dpc)            :: sigma
>> > >>>>>>
>> > >>>>>>       PetscErrorCode ierr
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>       if (rank==0) call cpu_time(begin)
>> > >>>>>>
>> > >>>>>>       if (rank==0) then
>> > >>>>>>          write(*,*)
>> > >>>>>>          write(*,*)'Testing How Big LU Can Be...'
>> > >>>>>>          write(*,*)'============================'
>> > >>>>>>          write(*,*)
>> > >>>>>>       endif
>> > >>>>>>
>> > >>>>>>       sigma = (1.0d0,0.0d0)
>> > >>>>>>       call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) !
>> on
>> > >>>>>> exit A = A-sigma*B
>> > >>>>>>
>> > >>>>>> !.....Write Matrix to ASCII and Binary Format
>> > >>>>>>       !call
>> > >>>>>> PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>> > >>>>>>       !call MatView(DXX,viewer,ierr)
>> > >>>>>>       !call PetscViewerDestroy(viewer,ierr)
>> > >>>>>>
>> > >>>>>>       call
>> > >>>>>>
>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>> > >>>>>>       call MatView(A,viewer,ierr)
>> > >>>>>>       call PetscViewerDestroy(viewer,ierr)
>> > >>>>>>
>> > >>>>>> !.....Create Linear Solver Context
>> > >>>>>>       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>> > >>>>>>
>> > >>>>>> !.....Set operators. Here the matrix that defines the linear
>> system
>> > >>>>>> also serves as the preconditioning matrix.
>> > >>>>>>       !call
>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
>> > >>>>>> !aha commented and replaced by next line
>> > >>>>>>       call KSPSetOperators(ksp,A,A,ierr) ! remember: here A =
>> > >>>>>> A-sigma*B
>> > >>>>>>
>> > >>>>>> !.....Set Relative and Absolute Tolerances and Uses Default for
>> > >>>>>> Divergence Tol
>> > >>>>>>       tol = 1.e-10
>> > >>>>>>       call
>> > >>>>>>
>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>> > >>>>>>
>> > >>>>>> !.....Set the Direct (LU) Solver
>> > >>>>>>       call KSPSetType(ksp,KSPPREONLY,ierr)
>> > >>>>>>       call KSPGetPC(ksp,pc,ierr)
>> > >>>>>>       call PCSetType(pc,PCLU,ierr)
>> > >>>>>>       call
>> PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr)
>> > >>>>>> ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>> > >>>>>>
>> > >>>>>> !.....Create Right-Hand-Side Vector
>> > >>>>>>       call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>> > >>>>>>       call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>> > >>>>>>
>> > >>>>>>       allocate(xwork1(IendA-IstartA))
>> > >>>>>>       allocate(loc(IendA-IstartA))
>> > >>>>>>
>> > >>>>>>       ct=0
>> > >>>>>>       do i=IstartA,IendA-1
>> > >>>>>>          ct=ct+1
>> > >>>>>>          loc(ct)=i
>> > >>>>>>          xwork1(ct)=(1.0d0,0.0d0)
>> > >>>>>>       enddo
>> > >>>>>>
>> > >>>>>>       call
>> > >>>>>> VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>> > >>>>>>       call VecZeroEntries(sol,ierr)
>> > >>>>>>
>> > >>>>>>       deallocate(xwork1,loc)
>> > >>>>>>
>> > >>>>>> !.....Assemble Vectors
>> > >>>>>>       call VecAssemblyBegin(frhs,ierr)
>> > >>>>>>       call VecAssemblyEnd(frhs,ierr)
>> > >>>>>>
>> > >>>>>> !.....Solve the Linear System
>> > >>>>>>       call KSPSolve(ksp,frhs,sol,ierr)
>> > >>>>>>
>> > >>>>>>       !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> > >>>>>>
>> > >>>>>>       if (rank==0) then
>> > >>>>>>          call cpu_time(endd)
>> > >>>>>>          write(*,*)
>> > >>>>>>          print '("Total time for HowBigLUCanBe = ",f21.3,"
>> > >>>>>> seconds.")',endd-begin
>> > >>>>>>       endif
>> > >>>>>>
>> > >>>>>>       call SlepcFinalize(ierr)
>> > >>>>>>
>> > >>>>>>       STOP
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>     end Subroutine HowBigLUCanBe
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
>> > >>>>>>
>> > >>>>>>  Indeed, the parallel symbolic factorization routine needs power
>> of
>> > >>>>>> 2 processes, however, you can use however many processes you
>> need;
>> > >>>>>> internally, we redistribute matrix to nearest power of 2
>> processes, do
>> > >>>>>> symbolic, then redistribute back to all the processes to do
>> factorization,
>> > >>>>>> triangular solve etc.  So, there is no  restriction from the
>> users
>> > >>>>>> viewpoint.
>> > >>>>>>
>> > >>>>>>  It's difficult to tell what the problem is.  Do you think you
>> can
>> > >>>>>> print your matrix, then, I can do some debugging by running
>> superlu_dist
>> > >>>>>> standalone?
>> > >>>>>>
>> > >>>>>>  Sherry
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <
>> > >>>>>> aph at email.arizona.edu> wrote:
>> > >>>>>>
>> > >>>>>>>   Hi,
>> > >>>>>>>
>> > >>>>>>>  I have used the switch -mat_superlu_dist_parsymbfact in my pbs
>> > >>>>>>> script. However, although my program worked fine with
>> sequential symbolic
>> > >>>>>>> factorization, I get one of the following 2 behaviors when I
>> run with
>> > >>>>>>> parallel symbolic factorization (depending on the number of
>> processors that
>> > >>>>>>> I use):
>> > >>>>>>>
>> > >>>>>>>  1) the program just hangs (it seems stuck in some subroutine
>> ==>
>> > >>>>>>> see test.out-hangs)
>> > >>>>>>>  2) I get a floating point exception ==> see
>> > >>>>>>> test.out-floating-point-exception
>> > >>>>>>>
>> > >>>>>>>  Note that as suggested in the Superlu manual, I use a power of
>> 2
>> > >>>>>>> number of procs. Are there any tunable parameters for the
>> parallel symbolic
>> > >>>>>>> factorization? Note that when I build my sparse matrix, most
>> elements I add
>> > >>>>>>> are nonzero of course but to simplify the programming, I also
>> add a few
>> > >>>>>>> zero elements in the sparse matrix. I was thinking that maybe
>> if the
>> > >>>>>>> parallel symbolic factorization proceed by block, there could
>> be some
>> > >>>>>>> blocks where the pivot would be zero, hence creating the FPE??
>> > >>>>>>>
>> > >>>>>>>  Thanks,
>> > >>>>>>>
>> > >>>>>>>  Anthony
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov>
>> wrote:
>> > >>>>>>>
>> > >>>>>>>>  Did you find out how to change option to use parallel symbolic
>> > >>>>>>>> factorization?  Perhaps PETSc team can help.
>> > >>>>>>>>
>> > >>>>>>>>  Sherry
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov>
>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>>  Is there an inquiry function that tells you all the available
>> > >>>>>>>>> options?
>> > >>>>>>>>>
>> > >>>>>>>>>  Sherry
>> > >>>>>>>>>
>> > >>>>>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <
>> > >>>>>>>>> aph at email.arizona.edu> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>>    Hi Sherry,
>> > >>>>>>>>>>
>> > >>>>>>>>>>  Thanks for your message. I have used superlu_dist default
>> > >>>>>>>>>> options. I did not realize that I was doing serial symbolic
>> factorization.
>> > >>>>>>>>>> That is probably the cause of my problem.
>> > >>>>>>>>>>  Each node on Garnet has 60GB usable memory and I can run
>> with
>> > >>>>>>>>>> 1,2,4,8,16 or 32 core per node.
>> > >>>>>>>>>>
>> > >>>>>>>>>>  So I should use:
>> > >>>>>>>>>>
>> > >>>>>>>>>> -mat_superlu_dist_r 20
>> > >>>>>>>>>> -mat_superlu_dist_c 32
>> > >>>>>>>>>>
>> > >>>>>>>>>>  How do you specify the parallel symbolic factorization
>> option?
>> > >>>>>>>>>> is it -mat_superlu_dist_matinput 1
>> > >>>>>>>>>>
>> > >>>>>>>>>>  Thanks,
>> > >>>>>>>>>>
>> > >>>>>>>>>>  Anthony
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov>
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>>  For superlu_dist failure, this occurs during symbolic
>> > >>>>>>>>>>> factorization.  Since you are using serial symbolic
>> factorization, it
>> > >>>>>>>>>>> requires the entire graph of A to be available in the
>> memory of one MPI
>> > >>>>>>>>>>> task. How much memory do you have for each MPI task?
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>  It won't help even if you use more processes.  You should
>> try
>> > >>>>>>>>>>> to use parallel symbolic factorization option.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>  Another point.  You set up process grid as:
>> > >>>>>>>>>>>        Process grid nprow 32 x npcol 20
>> > >>>>>>>>>>>  For better performance, you show swap the grid dimension.
>> That
>> > >>>>>>>>>>> is, it's better to use 20 x 32, never gives nprow larger
>> than npcol.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>  Sherry
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <
>> bsmith at mcs.anl.gov>
>> > >>>>>>>>>>> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    I would suggest running a sequence of problems, 101 by
>> 101
>> > >>>>>>>>>>>> 111 by 111 etc and get the memory usage in each case (when
>> you run out of
>> > >>>>>>>>>>>> memory you can get NO useful information out about memory
>> needs). You can
>> > >>>>>>>>>>>> then plot memory usage as a function of problem size to
>> get a handle on how
>> > >>>>>>>>>>>> much memory it is using.  You can also run on more and
>> more processes
>> > >>>>>>>>>>>> (which have a total of more memory) to see how large a
>> problem you may be
>> > >>>>>>>>>>>> able to reach.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    MUMPS also has an "out of core" version (which we have
>> never
>> > >>>>>>>>>>>> used) that could in theory anyways let you get to large
>> problems if you
>> > >>>>>>>>>>>> have lots of disk space, but you are on your own figuring
>> out how to use it.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>   Barry
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <
>> > >>>>>>>>>>>> aph at email.arizona.edu> wrote:
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > Hi Jose,
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > In my code, I use once PETSc to solve a linear system to
>> get
>> > >>>>>>>>>>>> the baseflow (without using SLEPc) and then I use SLEPc to
>> do the stability
>> > >>>>>>>>>>>> analysis of that baseflow. This is why, there are some
>> SLEPc options that
>> > >>>>>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am
>> solving for the
>> > >>>>>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case
>> for which I get
>> > >>>>>>>>>>>> the eigenvalues. That case works fine. However If i
>> increase to 151x151, I
>> > >>>>>>>>>>>> get the error that you can see in
>> test.out-superlu_dist-151x151 (similar
>> > >>>>>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ).
>> If you look a the
>> > >>>>>>>>>>>> very end of the files test.out-superlu_dist-151x151 and
>> > >>>>>>>>>>>> test.out-mumps-151x151, you will see that the last info
>> message printed is:
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > On Processor (after EPSSetFromOptions)  0    memory:
>> > >>>>>>>>>>>> 0.65073152000E+08          =====>  (see line 807 of
>> module_petsc.F90)
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > This means that the memory error probably occurs in the
>> call
>> > >>>>>>>>>>>> to EPSSolve (see module_petsc.F90 line 810). I would like
>> to evaluate how
>> > >>>>>>>>>>>> much memory is required by the most memory intensive
>> operation within
>> > >>>>>>>>>>>> EPSSolve. Since I am solving a generalized EVP, I would
>> imagine that it
>> > >>>>>>>>>>>> would be the LU decomposition. But is there an accurate
>> way of doing it?
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > Before starting with iterative solvers, I would like to
>> > >>>>>>>>>>>> exploit as much as I can direct solvers. I tried GMRES
>> with default
>> > >>>>>>>>>>>> preconditioner at some point but I had convergence
>> problem. What
>> > >>>>>>>>>>>> solver/preconditioner would you recommend for a
>> generalized non-Hermitian
>> > >>>>>>>>>>>> (EPS_GNHEP) EVP?
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > Thanks,
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > Anthony
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <
>> > >>>>>>>>>>>> jroman at dsic.upv.es> wrote:
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > > Hi,
>> > >>>>>>>>>>>> > >
>> > >>>>>>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and
>> > >>>>>>>>>>>> superlu_dist for the LU decomposition (my problem is a
>> generalized
>> > >>>>>>>>>>>> eigenvalue problem). The code runs fine for a grid with
>> 101x101 but when I
>> > >>>>>>>>>>>> increase to 151x151, I get the following error:
>> > >>>>>>>>>>>> > >
>> > >>>>>>>>>>>> > > Can't expand MemType 1: jcol 16104   (and then [NID
>> 00037]
>> > >>>>>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated
>> this process.)
>> > >>>>>>>>>>>> > >
>> > >>>>>>>>>>>> > > It seems to be a memory problem. I monitor the memory
>> usage
>> > >>>>>>>>>>>> as far as I can and it seems that memory usage is pretty
>> low. The most
>> > >>>>>>>>>>>> memory intensive part of the program is probably the LU
>> decomposition in
>> > >>>>>>>>>>>> the context of the generalized EVP. Is there a way to
>> evaluate how much
>> > >>>>>>>>>>>> memory will be required for that step? I am currently
>> running the debug
>> > >>>>>>>>>>>> version of the code which I would assume would use more
>> memory?
>> > >>>>>>>>>>>> > >
>> > >>>>>>>>>>>> > > I have attached the output of the job. Note that the
>> > >>>>>>>>>>>> program uses twice PETSc: 1) to solve a linear system for
>> which no problem
>> > >>>>>>>>>>>> occurs, and, 2) to solve the Generalized EVP with SLEPc,
>> where I get the
>> > >>>>>>>>>>>> error.
>> > >>>>>>>>>>>> > >
>> > >>>>>>>>>>>> > > Thanks
>> > >>>>>>>>>>>> > >
>> > >>>>>>>>>>>> > > Anthony
>> > >>>>>>>>>>>> > > <test.out-superlu_dist-151x151>
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > In the output you are attaching there are no SLEPc
>> objects in
>> > >>>>>>>>>>>> the report and SLEPc options are not used. It seems that
>> SLEPc calls are
>> > >>>>>>>>>>>> skipped?
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > Do you get the same error with MUMPS? Have you tried to
>> solve
>> > >>>>>>>>>>>> linear systems with a preconditioned iterative solver?
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> > Jose
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>> >
>> > >>>>>>>>>>>>  >
>> > >>>>>>>>>>>>
>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>
>> > >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150729/19091e42/attachment-0001.html>