<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Thanks for quick update.  In the new tarball, I have already removed the junk files, as pointed out by Satish.<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Sherry<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 29, 2015 at 8:36 AM, Hong <span dir="ltr"><<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Sherry,<div>With your bugfix, superlu_dist-4.1 works now:</div><div><br></div><div><div>petsc/src/ksp/ksp/examples/tutorials (master)</div><div>$ mpiexec -n 4 ./ex10 -f0 Amat_binary.m -rhs 0 -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact</div><div>Number of iterations =   1</div><div>Residual norm 2.11605e-11</div></div><div><br></div><div>Once you address Satish's request, we'll update petsc interface to this version of superlu_dist.</div><div><br></div><div><span style="font-size:12.8000001907349px">Anthony:</span><br></div><div>Please download the latest superlu_dist-v4.1,</div><div>then configure petsc with </div><div>'--download-superlu_dist=superlu_dist_4.1.tar.gz'<br></div><div><br></div><div>Hong</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Tue, Jul 28, 2015 at 11:11 AM, Satish Balay <span dir="ltr"><<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">Sherry,<br>

<br>

One minor issue with the tarball. I see the following new files in the v4.1 tarball<br>

[when comparing it with v4.0]. Some of these files are perhaps junk files - and can<br>

be removed from the tarball?<br>

<br>

   EXAMPLE/dscatter.c.bak<br>

   EXAMPLE/g10.cua<br>

   EXAMPLE/g4.cua<br>

   EXAMPLE/g4.postorder.eps<br>

   EXAMPLE/g4.rua<br>

   EXAMPLE/g4_postorder.jpg<br>

   EXAMPLE/hostname<br>

   EXAMPLE/pdgssvx.c<br>

   EXAMPLE/pdgstrf2.c<br>

   EXAMPLE/pwd<br>

   EXAMPLE/pzgstrf2.c<br>

   EXAMPLE/pzgstrf_v3.3.c<br>

   EXAMPLE/pzutil.c<br>

   EXAMPLE/test.bat<br>

   EXAMPLE/test.cpu.bat<br>

   EXAMPLE/test.err<br>

   EXAMPLE/test.err.1<br>

   EXAMPLE/zlook_ahead_update.c<br>

   FORTRAN/make.out<br>

   FORTRAN/zcreate_dist_matrix.c<br>

   MAKE_INC/make.xc30<br>

   SRC/int_t<br>

   SRC/lnbrow<br>

   SRC/make.out<br>

   SRC/rnbrow<br>

   SRC/temp<br>

   SRC/temp1<br>

<br>

<br>

Thanks,<br>

Satish<br>

<span><br>

<br>

On Tue, 28 Jul 2015, Xiaoye S. Li wrote:<br>

<br>

> I am checking v4.1 now. I'll let you know when I fixed the problem.<br>

><br>

> Sherry<br>

><br>

</span></div></div><span class=""><span>> On Tue, Jul 28, 2015 at 8:27 AM, Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>

><br>

> > Sherry,<br>

> > I tested with superlu_dist v4.1. The extra printings are gone, but hang<br>

> > remains.<br>

> > It hangs at<br>

> ><br>

> > #5  0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0,<br>

> > status=0x7fff9cd83d60)<br>

> >     at src/mpi/pt2pt/wait.c:168<br>

> > #6  0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, n=4900,<br>

> >     anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8,<br>

> >     stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308<br>

> ><br>

> >                 if (recv_req[0] != MPI_REQUEST_NULL) {<br>

> >  -->                   MPI_Wait (&recv_req[0], &status);<br>

> ><br>

> > We will update petsc interface to superlu_dist v4.1.<br>

> ><br>

> > Hong<br>

> ><br>

> ><br>

</span></span><span class=""><span>> > On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>> wrote:<br>

> ><br>

> >> Hong,<br>

> >> Thanks for trying out.<br>

> >> The extra printings are not properly guarded by the print level.  I will<br>

> >> fix that.   I will look into the hang problem soon.<br>

> >><br>

> >> Sherry<br>

> >> <br>

> >><br>

</span></span><div><div class="h5"><div><div>> >> On Mon, Jul 27, 2015 at 7:50 PM, Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>

> >><br>

> >>> Sherry,<br>

> >>><br>

> >>> I can repeat hang using petsc/src/ksp/ksp/examples/tutorials/ex10.c:<br>

> >>> mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0 -pc_type<br>

> >>> lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact<br>

> >>> ...<br>

> >>> .. Starting with 1 OpenMP threads<br>

> >>> [0] .. BIG U size 1342464<br>

> >>> [0] .. BIG V size 131072<br>

> >>>   Max row size is 1311<br>

> >>>   Using buffer_size of 5000000<br>

> >>>   Threads per process 1<br>

> >>> ...<br>

> >>><br>

> >>> using a debugger (with petsc option '-start_in_debugger'), I find that<br>

> >>> hang occurs at<br>

> >>> #0  0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4,<br>

> >>>     timeout=<optimized out>, timeout@entry=-1)<br>

> >>>     at ../sysdeps/unix/sysv/linux/poll.c:83<br>

> >>> #1  0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550,<br>

> >>>     millisecond_timeout=millisecond_timeout@entry=-1,<br>

> >>>     eventp=eventp@entry=0x7fff654930b0)<br>

> >>>     at src/mpid/common/sock/poll/sock_wait.i:123<br>

> >>> #2  0x00007f117de898b8 in MPIDI_CH3i_Progress_wait (<br>

> >>>     progress_state=0x7fff65493120)<br>

> >>>     at src/mpid/ch3/channels/sock/src/ch3_progress.c:218<br>

> >>> #3  MPIDI_CH3I_Progress (blocking=blocking@entry=1,<br>

> >>>     state=state@entry=0x7fff65493120)<br>

> >>>     at src/mpid/ch3/channels/sock/src/ch3_progress.c:921<br>

> >>> #4  0x00007f117de1a559 in MPIR_Wait_impl (request=request@entry<br>

> >>> =0x262df90,<br>

> >>>     status=status@entry=0x7fff65493390) at src/mpi/pt2pt/wait.c:67<br>

> >>> #5  0x00007f117de1a818 in PMPI_Wait (request=0x262df90,<br>

> >>> status=0x7fff65493390)<br>

> >>>     at src/mpi/pt2pt/wait.c:168<br>

> >>> #6  0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900, n=4900,<br>

> >>>     anorm=13.738475134194639, LUstruct=0x23dfe78, grid=0x23dfd78,<br>

> >>>     stat=0x7fff65493ea0, info=0x7fff65493edc) at pzgstrf.c:1308<br>

> >>><br>

> >>> #7  0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, A=0x23dfe30,<br>

> >>>     ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0, grid=0x23dfd78,<br>

> >>>     LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0,<br>

> >>> stat=0x7fff65493ea0,<br>

> >>> ---Type <return> to continue, or q <return> to quit---<br>

> >>>     info=0x7fff65493edc) at pzgssvx.c:1063<br>

> >>><br>

> >>> #8  0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST (F=0x23a0110,<br>

> >>>     A=0x21bb7e0, info=0x2355068)<br>

> >>>     at<br>

> >>> /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411<br>

> >>> #9  0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110,<br>

> >>> mat=0x21bb7e0,<br>

> >>>     info=0x2355068) at<br>

> >>> /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946<br>

> >>> #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10)<br>

> >>>     at /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152<br>

> >>> #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10)<br>

> >>>     at /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983<br>

> >>> #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0)<br>

> >>>     at /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332<br>

> >>> #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578)<br>

> >>>     at /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312<br>

> >>><br>

> >>> You may take a look at it. Sequential symbolic factorization works fine.<br>

> >>><br>

> >>> Why superlu_dist (v4.0) in complex precision displays<br>

> >>><br>

> >>> .. Starting with 1 OpenMP threads<br>

> >>> [0] .. BIG U size 1342464<br>

> >>> [0] .. BIG V size 131072<br>

> >>>   Max row size is 1311<br>

> >>>   Using buffer_size of 5000000<br>

> >>>   Threads per process 1<br>

> >>> ...<br>

> >>><br>

> >>> I realize that I use superlu_dist v4.0. Would v4.1 works? I'll give it a<br>

> >>> try tomorrow.<br>

> >>><br>

> >>> Hong<br>

> >>><br>

> >>> On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas <<br>

</div></div></div></div><div><div class="h5"><div><div>> >>> <a href="mailto:aph@email.arizona.edu" target="_blank">aph@email.arizona.edu</a>> wrote:<br>

> >>><br>

> >>>> Hi Hong,<br>

> >>>><br>

> >>>> No that is not the correct matrix. Note that I forgot to mention that<br>

> >>>> it is a complex matrix. I tried loading the matrix I sent you this morning<br>

> >>>> with:<br>

> >>>><br>

> >>>> !...Load a Matrix in Binary Format<br>

> >>>>       call<br>

> >>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)<br>

> >>>>       call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)<br>

> >>>>       call MatSetType(DLOAD,MATAIJ,ierr)<br>

> >>>>       call MatLoad(DLOAD,viewer,ierr)<br>

> >>>>       call PetscViewerDestroy(viewer,ierr)<br>

> >>>><br>

> >>>>       call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr)<br>

> >>>><br>

> >>>> The first 37 rows should look like this:<br>

> >>>><br>

> >>>> Mat Object: 2 MPI processes<br>

> >>>>   type: mpiaij<br>

> >>>> row 0: (0, 1)<br>

> >>>> row 1: (1, 1)<br>

> >>>> row 2: (2, 1)<br>

> >>>> row 3: (3, 1)<br>

> >>>> row 4: (4, 1)<br>

> >>>> row 5: (5, 1)<br>

> >>>> row 6: (6, 1)<br>

> >>>> row 7: (7, 1)<br>

> >>>> row 8: (8, 1)<br>

> >>>> row 9: (9, 1)<br>

> >>>> row 10: (10, 1)<br>

> >>>> row 11: (11, 1)<br>

> >>>> row 12: (12, 1)<br>

> >>>> row 13: (13, 1)<br>

> >>>> row 14: (14, 1)<br>

> >>>> row 15: (15, 1)<br>

> >>>> row 16: (16, 1)<br>

> >>>> row 17: (17, 1)<br>

> >>>> row 18: (18, 1)<br>

> >>>> row 19: (19, 1)<br>

> >>>> row 20: (20, 1)<br>

> >>>> row 21: (21, 1)<br>

> >>>> row 22: (22, 1)<br>

> >>>> row 23: (23, 1)<br>

> >>>> row 24: (24, 1)<br>

> >>>> row 25: (25, 1)<br>

> >>>> row 26: (26, 1)<br>

> >>>> row 27: (27, 1)<br>

> >>>> row 28: (28, 1)<br>

> >>>> row 29: (29, 1)<br>

> >>>> row 30: (30, 1)<br>

> >>>> row 31: (31, 1)<br>

> >>>> row 32: (32, 1)<br>

> >>>> row 33: (33, 1)<br>

> >>>> row 34: (34, 1)<br>

> >>>> row 35: (35, 1)<br>

> >>>> row 36: (1, -41.2444)  (35, -41.2444)  (36, 118.049 - 0.999271 i) (37,<br>

> >>>> -21.447)  (38, 5.18873)  (39, -2.34856)  (40, 1.3607)  (41, -0.898206)<br>

> >>>> (42, 0.642715)  (43, -0.48593)  (44, 0.382471)  (45, -0.310476)  (46,<br>

> >>>> 0.258302)  (47, -0.219268)  (48, 0.189304)  (49, -0.165815)  (50,<br>

> >>>> 0.147076)  (51, -0.131907)  (52, 0.119478)  (53, -0.109189)  (54, 0.1006)<br>

> >>>> (55, -0.0933795)  (56, 0.0872779)  (57, -0.0821019)  (58, 0.0777011)  (59,<br>

> >>>> -0.0739575)  (60, 0.0707775)  (61, -0.0680868)  (62, 0.0658258)  (63,<br>

> >>>> -0.0639473)  (64, 0.0624137)  (65, -0.0611954)  (66, 0.0602698)  (67,<br>

> >>>> -0.0596202)  (68, 0.0592349)  (69, -0.0295536)  (71, -21.447)  (106,<br>

> >>>> 5.18873)  (141, -2.34856)  (176, 1.3607)  (211, -0.898206)  (246,<br>

> >>>> 0.642715)  (281, -0.48593)  (316, 0.382471)  (351, -0.310476)  (386,<br>

> >>>> 0.258302)  (421, -0.219268)  (456, 0.189304)  (491, -0.165815)  (526,<br>

> >>>> 0.147076)  (561, -0.131907)  (596, 0.119478)  (631, -0.109189)  (666,<br>

> >>>> 0.1006)  (701, -0.0933795)  (736, 0.0872779)  (771, -0.0821019)  (806,<br>

> >>>> 0.0777011)  (841, -0.0739575)  (876, 0.0707775)  (911, -0.0680868)  (946,<br>

> >>>> 0.0658258)  (981, -0.0639473)  (1016, 0.0624137)  (1051, -0.0611954)<br>

> >>>> (1086, 0.0602698)  (1121, -0.0596202)  (1156, 0.0592349)  (1191,<br>

> >>>> -0.0295536)  (1261, 0)  (3676, 117.211)  (3711, -58.4801)  (3746,<br>

> >>>> -78.3633)  (3781, 29.4911)  (3816, -15.8073)  (3851, 9.94324)  (3886,<br>

> >>>> -6.87205)  (3921, 5.05774)  (3956, -3.89521)  (3991, 3.10522)  (4026,<br>

> >>>> -2.54388)  (4061, 2.13082)  (4096, -1.8182)  (4131, 1.57606)  (4166,<br>

> >>>> -1.38491)  (4201, 1.23155)  (4236, -1.10685)  (4271, 1.00428)  (4306,<br>

> >>>> -0.919116)  (4341, 0.847829)  (4376, -0.787776)  (4411, 0.736933)  (4446,<br>

> >>>> -0.693735)  (4481, 0.656958)  (4516, -0.625638)  (4551, 0.599007)  (4586,<br>

> >>>> -0.576454)  (4621, 0.557491)  (4656, -0.541726)  (4691, 0.528849)  (4726,<br>

> >>>> -0.518617)  (4761, 0.51084)  (4796, -0.50538)  (4831, 0.502142)  (4866,<br>

> >>>> -0.250534)<br>

> >>>><br>

> >>>><br>

> >>>> Thanks,<br>

> >>>><br>

> >>>> Anthony<br>

> >>>><br>

> >>>><br>

> >>>><br>

> >>>><br>

> >>>><br>

</div></div></div></div><div><div class="h5"><div><div>> >>>> On Fri, Jul 24, 2015 at 7:56 PM, Hong <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>

> >>>><br>

> >>>>> Anthony:<br>

> >>>>> I test your Amat_binary.m<br>

> >>>>> using petsc/src/ksp/ksp/examples/tutorials/ex10.c.<br>

> >>>>> Your matrix has many zero rows:<br>

> >>>>> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more<br>

> >>>>> Mat Object: 1 MPI processes<br>

> >>>>>   type: seqaij<br>

> >>>>> row 0: (0, 1)<br>

> >>>>> row 1: (1, 0)<br>

> >>>>> row 2: (2, 1)<br>

> >>>>> row 3: (3, 0)<br>

> >>>>> row 4: (4, 1)<br>

> >>>>> row 5: (5, 0)<br>

> >>>>> row 6: (6, 1)<br>

> >>>>> row 7: (7, 0)<br>

> >>>>> row 8: (8, 1)<br>

> >>>>> row 9: (9, 0)<br>

> >>>>> ...<br>

> >>>>> row 36: (1, 1)  (35, 0)  (36, 1)  (37, 0)  (38, 1)  (39, 0)  (40, 1)<br>

> >>>>>  (41, 0)  (42, 1)  (43, 0)  (44, 1)  (45,<br>

> >>>>> 0)  (46, 1)  (47, 0)  (48, 1)  (49, 0)  (50, 1)  (51, 0)  (52, 1)<br>

> >>>>>  (53, 0)  (54, 1)  (55, 0)  (56, 1)  (57, 0)<br>

> >>>>>  (58, 1)  (59, 0)  (60, 1)  ...<br>

> >>>>><br>

> >>>>> Do you send us correct matrix?<br>

> >>>>><br>

> >>>>>><br>

> >>>>>> I ran my code through valgrind and gdb as suggested by Barry. I am<br>

> >>>>>> now coming back to some problem I have had while running with parallel<br>

> >>>>>> symbolic factorization. I am attaching a test matrix (petsc binary format)<br>

> >>>>>> that I LU decompose and then use to solve a linear system (see code below).<br>

> >>>>>> I can run on 2 processors with parsymbfact or with 4 processors without<br>

> >>>>>> parsymbfact. However, if I run on 4 procs with parsymbfact, the code is<br>

> >>>>>> just hanging. Below is the simplified test case that I have used to test.<br>

> >>>>>> The matrix A and B are built somewhere else in my program. The matrix I am<br>

> >>>>>> attaching is A-sigma*B (see below).<br>

> >>>>>><br>

> >>>>>> One thing is that I don't know for sparse matrices what is the<br>

> >>>>>> optimum number of processors to use for a LU decomposition? Does it depend<br>

> >>>>>> on the total number of nonzero? Do you have an easy way to compute it?<br>

> >>>>>><br>

> >>>>><br>

> >>>>> You have to experiment your matrix on a target machine to find out.<br>

> >>>>><br>

> >>>>> Hong<br>

> >>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>>      Subroutine HowBigLUCanBe(rank)<br>

> >>>>>><br>

> >>>>>>       IMPLICIT NONE<br>

> >>>>>><br>

> >>>>>>       integer(i4b),intent(in) :: rank<br>

> >>>>>>       integer(i4b)            :: i,ct<br>

> >>>>>>       real(dp)                :: begin,endd<br>

> >>>>>>       complex(dpc)            :: sigma<br>

> >>>>>><br>

> >>>>>>       PetscErrorCode ierr<br>

> >>>>>><br>

> >>>>>><br>

> >>>>>>       if (rank==0) call cpu_time(begin)<br>

> >>>>>><br>

> >>>>>>       if (rank==0) then<br>

> >>>>>>          write(*,*)<br>

> >>>>>>          write(*,*)'Testing How Big LU Can Be...'<br>

> >>>>>>          write(*,*)'============================'<br>

> >>>>>>          write(*,*)<br>

> >>>>>>       endif<br>

> >>>>>><br>

> >>>>>>       sigma = (1.0d0,0.0d0)<br>

> >>>>>>       call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on<br>

> >>>>>> exit A = A-sigma*B<br>

> >>>>>><br>

> >>>>>> !.....Write Matrix to ASCII and Binary Format<br>

> >>>>>>       !call<br>

> >>>>>> PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)<br>

> >>>>>>       !call MatView(DXX,viewer,ierr)<br>

> >>>>>>       !call PetscViewerDestroy(viewer,ierr)<br>

> >>>>>><br>

> >>>>>>       call<br>

> >>>>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)<br>

> >>>>>>       call MatView(A,viewer,ierr)<br>

> >>>>>>       call PetscViewerDestroy(viewer,ierr)<br>

> >>>>>><br>

> >>>>>> !.....Create Linear Solver Context<br>

> >>>>>>       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)<br>

> >>>>>><br>

> >>>>>> !.....Set operators. Here the matrix that defines the linear system<br>

> >>>>>> also serves as the preconditioning matrix.<br>

> >>>>>>       !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)<br>

> >>>>>> !aha commented and replaced by next line<br>

> >>>>>>       call KSPSetOperators(ksp,A,A,ierr) ! remember: here A =<br>

> >>>>>> A-sigma*B<br>

> >>>>>><br>

> >>>>>> !.....Set Relative and Absolute Tolerances and Uses Default for<br>

> >>>>>> Divergence Tol<br>

> >>>>>>       tol = 1.e-10<br>

> >>>>>>       call<br>

> >>>>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)<br>

> >>>>>><br>

> >>>>>> !.....Set the Direct (LU) Solver<br>

> >>>>>>       call KSPSetType(ksp,KSPPREONLY,ierr)<br>

> >>>>>>       call KSPGetPC(ksp,pc,ierr)<br>

> >>>>>>       call PCSetType(pc,PCLU,ierr)<br>

> >>>>>>       call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr)<br>

> >>>>>> ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS<br>

> >>>>>><br>

> >>>>>> !.....Create Right-Hand-Side Vector<br>

> >>>>>>       call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)<br>

> >>>>>>       call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)<br>

> >>>>>><br>

> >>>>>>       allocate(xwork1(IendA-IstartA))<br>

> >>>>>>       allocate(loc(IendA-IstartA))<br>

> >>>>>><br>

> >>>>>>       ct=0<br>

> >>>>>>       do i=IstartA,IendA-1<br>

> >>>>>>          ct=ct+1<br>

> >>>>>>          loc(ct)=i<br>

> >>>>>>          xwork1(ct)=(1.0d0,0.0d0)<br>

> >>>>>>       enddo<br>

> >>>>>><br>

> >>>>>>       call<br>

> >>>>>> VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)<br>

> >>>>>>       call VecZeroEntries(sol,ierr)<br>

> >>>>>><br>

> >>>>>>       deallocate(xwork1,loc)<br>

> >>>>>><br>

> >>>>>> !.....Assemble Vectors<br>

> >>>>>>       call VecAssemblyBegin(frhs,ierr)<br>

> >>>>>>       call VecAssemblyEnd(frhs,ierr)<br>

> >>>>>><br>

> >>>>>> !.....Solve the Linear System<br>

> >>>>>>       call KSPSolve(ksp,frhs,sol,ierr)<br>

> >>>>>><br>

> >>>>>>       !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)<br>

> >>>>>><br>

> >>>>>>       if (rank==0) then<br>

> >>>>>>          call cpu_time(endd)<br>

> >>>>>>          write(*,*)<br>

> >>>>>>          print '("Total time for HowBigLUCanBe = ",f21.3,"<br>

> >>>>>> seconds.")',endd-begin<br>

> >>>>>>       endif<br>

> >>>>>><br>

> >>>>>>       call SlepcFinalize(ierr)<br>

> >>>>>><br>

> >>>>>>       STOP<br>

> >>>>>><br>

> >>>>>><br>

> >>>>>>     end Subroutine HowBigLUCanBe<br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:<br>

> >>>>>><br>

> >>>>>>  Indeed, the parallel symbolic factorization routine needs power of<br>

> >>>>>> 2 processes, however, you can use however many processes you need;<br>

> >>>>>> internally, we redistribute matrix to nearest power of 2 processes, do<br>

> >>>>>> symbolic, then redistribute back to all the processes to do factorization,<br>

> >>>>>> triangular solve etc.  So, there is no  restriction from the users<br>

> >>>>>> viewpoint.<br>

> >>>>>><br>

> >>>>>>  It's difficult to tell what the problem is.  Do you think you can<br>

> >>>>>> print your matrix, then, I can do some debugging by running superlu_dist<br>

> >>>>>> standalone?<br>

> >>>>>><br>

> >>>>>>  Sherry<br>

> >>>>>><br>

> >>>>>><br>

> >>>>>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <<br>

</div></div></div></div><span class=""><span>> >>>>>> <a href="mailto:aph@email.arizona.edu" target="_blank">aph@email.arizona.edu</a>> wrote:<br>

> >>>>>><br>

> >>>>>>>   Hi,<br>

> >>>>>>><br>

> >>>>>>>  I have used the switch -mat_superlu_dist_parsymbfact in my pbs<br>

> >>>>>>> script. However, although my program worked fine with sequential symbolic<br>

> >>>>>>> factorization, I get one of the following 2 behaviors when I run with<br>

> >>>>>>> parallel symbolic factorization (depending on the number of processors that<br>

> >>>>>>> I use):<br>

> >>>>>>><br>

> >>>>>>>  1) the program just hangs (it seems stuck in some subroutine ==><br>

> >>>>>>> see test.out-hangs)<br>

> >>>>>>>  2) I get a floating point exception ==> see<br>

> >>>>>>> test.out-floating-point-exception<br>

> >>>>>>><br>

> >>>>>>>  Note that as suggested in the Superlu manual, I use a power of 2<br>

> >>>>>>> number of procs. Are there any tunable parameters for the parallel symbolic<br>

> >>>>>>> factorization? Note that when I build my sparse matrix, most elements I add<br>

> >>>>>>> are nonzero of course but to simplify the programming, I also add a few<br>

> >>>>>>> zero elements in the sparse matrix. I was thinking that maybe if the<br>

> >>>>>>> parallel symbolic factorization proceed by block, there could be some<br>

> >>>>>>> blocks where the pivot would be zero, hence creating the FPE??<br>

> >>>>>>><br>

> >>>>>>>  Thanks,<br>

> >>>>>>><br>

> >>>>>>>  Anthony<br>

> >>>>>>><br>

> >>>>>>><br>

> >>>>>>><br>

</span></span><span class=""><span>> >>>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>> wrote:<br>

> >>>>>>><br>

> >>>>>>>>  Did you find out how to change option to use parallel symbolic<br>

> >>>>>>>> factorization?  Perhaps PETSc team can help.<br>

> >>>>>>>><br>

> >>>>>>>>  Sherry<br>

> >>>>>>>><br>

> >>>>>>>><br>

</span></span><span class=""><span>> >>>>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>> wrote:<br>

> >>>>>>>><br>

> >>>>>>>>>  Is there an inquiry function that tells you all the available<br>

> >>>>>>>>> options?<br>

> >>>>>>>>><br>

> >>>>>>>>>  Sherry<br>

> >>>>>>>>><br>

> >>>>>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <<br>

</span></span><span class=""><span>> >>>>>>>>> <a href="mailto:aph@email.arizona.edu" target="_blank">aph@email.arizona.edu</a>> wrote:<br>

> >>>>>>>>><br>

> >>>>>>>>>>    Hi Sherry,<br>

> >>>>>>>>>><br>

> >>>>>>>>>>  Thanks for your message. I have used superlu_dist default<br>

> >>>>>>>>>> options. I did not realize that I was doing serial symbolic factorization.<br>

> >>>>>>>>>> That is probably the cause of my problem.<br>

> >>>>>>>>>>  Each node on Garnet has 60GB usable memory and I can run with<br>

> >>>>>>>>>> 1,2,4,8,16 or 32 core per node.<br>

> >>>>>>>>>><br>

> >>>>>>>>>>  So I should use:<br>

> >>>>>>>>>><br>

> >>>>>>>>>> -mat_superlu_dist_r 20<br>

> >>>>>>>>>> -mat_superlu_dist_c 32<br>

> >>>>>>>>>><br>

> >>>>>>>>>>  How do you specify the parallel symbolic factorization option?<br>

> >>>>>>>>>> is it -mat_superlu_dist_matinput 1<br>

> >>>>>>>>>><br>

> >>>>>>>>>>  Thanks,<br>

> >>>>>>>>>><br>

> >>>>>>>>>>  Anthony<br>

> >>>>>>>>>><br>

> >>>>>>>>>><br>

</span></span>> >>>>>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>><span class=""><br>

<span>> >>>>>>>>>> wrote:<br>

> >>>>>>>>>><br>

> >>>>>>>>>>>  For superlu_dist failure, this occurs during symbolic<br>

> >>>>>>>>>>> factorization.  Since you are using serial symbolic factorization, it<br>

> >>>>>>>>>>> requires the entire graph of A to be available in the memory of one MPI<br>

> >>>>>>>>>>> task. How much memory do you have for each MPI task?<br>

> >>>>>>>>>>><br>

> >>>>>>>>>>>  It won't help even if you use more processes.  You should try<br>

> >>>>>>>>>>> to use parallel symbolic factorization option.<br>

> >>>>>>>>>>><br>

> >>>>>>>>>>>  Another point.  You set up process grid as:<br>

> >>>>>>>>>>>        Process grid nprow 32 x npcol 20<br>

> >>>>>>>>>>>  For better performance, you show swap the grid dimension. That<br>

> >>>>>>>>>>> is, it's better to use 20 x 32, never gives nprow larger than npcol.<br>

> >>>>>>>>>>><br>

> >>>>>>>>>>><br>

> >>>>>>>>>>>  Sherry<br>

> >>>>>>>>>>><br>

> >>>>>>>>>>><br>

</span></span>> >>>>>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>><span class=""><br>

<span>> >>>>>>>>>>> wrote:<br>

> >>>>>>>>>>><br>

> >>>>>>>>>>>><br>

> >>>>>>>>>>>>    I would suggest running a sequence of problems, 101 by 101<br>

> >>>>>>>>>>>> 111 by 111 etc and get the memory usage in each case (when you run out of<br>

> >>>>>>>>>>>> memory you can get NO useful information out about memory needs). You can<br>

> >>>>>>>>>>>> then plot memory usage as a function of problem size to get a handle on how<br>

> >>>>>>>>>>>> much memory it is using.  You can also run on more and more processes<br>

> >>>>>>>>>>>> (which have a total of more memory) to see how large a problem you may be<br>

> >>>>>>>>>>>> able to reach.<br>

> >>>>>>>>>>>><br>

> >>>>>>>>>>>>    MUMPS also has an "out of core" version (which we have never<br>

> >>>>>>>>>>>> used) that could in theory anyways let you get to large problems if you<br>

> >>>>>>>>>>>> have lots of disk space, but you are on your own figuring out how to use it.<br>

> >>>>>>>>>>>><br>

> >>>>>>>>>>>>   Barry<br>

> >>>>>>>>>>>><br>

> >>>>>>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <<br>

</span></span><div><div class="h5"><div><div>> >>>>>>>>>>>> <a href="mailto:aph@email.arizona.edu" target="_blank">aph@email.arizona.edu</a>> wrote:<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > Hi Jose,<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > In my code, I use once PETSc to solve a linear system to get<br>

> >>>>>>>>>>>> the baseflow (without using SLEPc) and then I use SLEPc to do the stability<br>

> >>>>>>>>>>>> analysis of that baseflow. This is why, there are some SLEPc options that<br>

> >>>>>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am solving for the<br>

> >>>>>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case for which I get<br>

> >>>>>>>>>>>> the eigenvalues. That case works fine. However If i increase to 151x151, I<br>

> >>>>>>>>>>>> get the error that you can see in test.out-superlu_dist-151x151 (similar<br>

> >>>>>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the<br>

> >>>>>>>>>>>> very end of the files test.out-superlu_dist-151x151 and<br>

> >>>>>>>>>>>> test.out-mumps-151x151, you will see that the last info message printed is:<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > On Processor (after EPSSetFromOptions)  0    memory:<br>

> >>>>>>>>>>>> 0.65073152000E+08          =====>  (see line 807 of module_petsc.F90)<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > This means that the memory error probably occurs in the call<br>

> >>>>>>>>>>>> to EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how<br>

> >>>>>>>>>>>> much memory is required by the most memory intensive operation within<br>

> >>>>>>>>>>>> EPSSolve. Since I am solving a generalized EVP, I would imagine that it<br>

> >>>>>>>>>>>> would be the LU decomposition. But is there an accurate way of doing it?<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > Before starting with iterative solvers, I would like to<br>

> >>>>>>>>>>>> exploit as much as I can direct solvers. I tried GMRES with default<br>

> >>>>>>>>>>>> preconditioner at some point but I had convergence problem. What<br>

> >>>>>>>>>>>> solver/preconditioner would you recommend for a generalized non-Hermitian<br>

> >>>>>>>>>>>> (EPS_GNHEP) EVP?<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > Thanks,<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > Anthony<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <<br>

</div></div></div></div><div><div class="h5"><div><div>> >>>>>>>>>>>> <a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>> wrote:<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > > Hi,<br>

> >>>>>>>>>>>> > ><br>

> >>>>>>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and<br>

> >>>>>>>>>>>> superlu_dist for the LU decomposition (my problem is a generalized<br>

> >>>>>>>>>>>> eigenvalue problem). The code runs fine for a grid with 101x101 but when I<br>

> >>>>>>>>>>>> increase to 151x151, I get the following error:<br>

> >>>>>>>>>>>> > ><br>

> >>>>>>>>>>>> > > Can't expand MemType 1: jcol 16104   (and then [NID 00037]<br>

> >>>>>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this process.)<br>

> >>>>>>>>>>>> > ><br>

> >>>>>>>>>>>> > > It seems to be a memory problem. I monitor the memory usage<br>

> >>>>>>>>>>>> as far as I can and it seems that memory usage is pretty low. The most<br>

> >>>>>>>>>>>> memory intensive part of the program is probably the LU decomposition in<br>

> >>>>>>>>>>>> the context of the generalized EVP. Is there a way to evaluate how much<br>

> >>>>>>>>>>>> memory will be required for that step? I am currently running the debug<br>

> >>>>>>>>>>>> version of the code which I would assume would use more memory?<br>

> >>>>>>>>>>>> > ><br>

> >>>>>>>>>>>> > > I have attached the output of the job. Note that the<br>

> >>>>>>>>>>>> program uses twice PETSc: 1) to solve a linear system for which no problem<br>

> >>>>>>>>>>>> occurs, and, 2) to solve the Generalized EVP with SLEPc, where I get the<br>

> >>>>>>>>>>>> error.<br>

> >>>>>>>>>>>> > ><br>

> >>>>>>>>>>>> > > Thanks<br>

> >>>>>>>>>>>> > ><br>

> >>>>>>>>>>>> > > Anthony<br>

> >>>>>>>>>>>> > > <test.out-superlu_dist-151x151><br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > In the output you are attaching there are no SLEPc objects in<br>

> >>>>>>>>>>>> the report and SLEPc options are not used. It seems that SLEPc calls are<br>

> >>>>>>>>>>>> skipped?<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > Do you get the same error with MUMPS? Have you tried to solve<br>

> >>>>>>>>>>>> linear systems with a preconditioned iterative solver?<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> > Jose<br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>> ><br>

> >>>>>>>>>>>>  ><br>

> >>>>>>>>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151><br>

> >>>>>>>>>>>><br>

> >>>>>>>>>>>><br>

> >>>>>>>>>>><br>

> >>>>>>>>>><br>

> >>>>>>>>><br>

> >>>>>>>><br>

> >>>>>>><br>

> >>>>>><br>

> >>>>>><br>

> >>>>><br>

> >>>><br>

> >>><br>

> >><br>

> ><br>

><br>

</div></div></div></div></blockquote></div><br></div>

</blockquote></div><br></div>