<div dir="ltr"><div>Dear PETSC users,</div><div><br></div><div>I try to solve a large problem (about 9,000,000 unknowns) with large number of processes (about 400 processes and 1TB). I guess that this is a reasonably large resource for solving this problem because I was able to solve the same problem using serial MUMPS with 500GB. Of course, it took very long to be computed.</div>
<div>The same code was parallelized with PETSC. However, my code with PETSC suddenly crashes after KSPSolve() successfully calls MUMPS as shown below. If this problem comes from MUMPS, I expect that MUMPS should produce an error report (ICNTL(4)=3), but no error report was not generated. Did anyone have such experience with PETSC+MUMPS? I request comments on its trouble shooting. In advance, I appreciate your help.</div>
<div><br></div><div>Regards,</div><div>Evan</div><div><br></div><div>Codes:</div><div><br>KSPCreate(PETSC_COMM_WORLD, &ksp);<br>KSPSetOperators(ksp, A, A);<br>KSPSetType (ksp, KSPPREONLY);<br>KSPGetPC(ksp, &pc);<br>
MatSetOption(A, MAT_SPD, PETSC_TRUE);<br>PCSetType(pc, PCCHOLESKY);<br>PCFactorSetMatSolverPackage(pc, MATSOLVERMUMPS);<br>PCFactorSetUpMatSolverPackage(pc);<br>PCFactorGetMatrix(pc, &F);<br>KSPSetType(ksp, KSPCG);<br>
MPI_Barrier(MPI_COMM_WORLD);<br>icntl=29; ival=2; // ParMetis<br>MatMumpsSetIcntl(F, icntl, ival);<br>icntl=4; ival=3; // Errors<br>MatMumpsSetIcntl(F, icntl, ival);<br>icntl=23; ival=1500; <br>MatMumpsSetIcntl(F, icntl, ival);</div>
<div>KSPSolve(ksp,b,x);<br></div><div><br></div><div><br></div><div><br></div><div>Errors: </div><div><br></div><div>Entering DMUMPS driver with JOB, N, NZ = 1 9778426 0</div><div> DMUMPS 4.10.0<br>L D L^T Solver for symmetric positive definite matrices<br>
Type of parallelism: Working host</div><div> ****** ANALYSIS STEP ********</div><div>Using ParMETIS for parallel ordering.<br>Structual symmetry is:100%<br>--------------------------------------------------------------------------<br>
WARNING: A process refused to die!</div><div>Host: n0000.voltaire0<br>PID: 28131</div><div>This process may still be running and/or consuming resources.</div><div>--------------------------------------------------------------------------<br>
[n0000.voltaire0:28047] 1 more process has sent help message help-odls-default.txt / odls-default:could-not-kill<br>[n0000.voltaire0:28047] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages<br>
--------------------------------------------------------------------------<br>MPI_ABORT was invoked on rank 11 in communicator MPI_COMM_WORLD<br>with errorcode 59.</div><div>NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.<br>
You may or may not see output from other processes, depending on<br>exactly when Open MPI kills them.<br>--------------------------------------------------------------------------<br>[1]PETSC ERROR: ------------------------------------------------------------------------<br>
[1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end<br>[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>[1]PETSC ERROR: or see <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC</a> ERROR: or try <a href="http://valgrind.org">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to find memory corruption errors<br>
[1]PETSC ERROR: likely location of problem given in stack below<br>[1]PETSC ERROR: --------------------- Stack Frames ------------------------------------<br>[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br>
[1]PETSC ERROR: INSTEAD the line number of the start of the function<br>[1]PETSC ERROR: is given.<br>[1]PETSC ERROR: [1] MatCholeskyFactorSymbolic_MUMPS line 1076 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/impls/aij/mpi/mumps/mumps.c<br>
[1]PETSC ERROR: [1] MatCholeskyFactorSymbolic line 2995 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/interface/matrix.c<br>[1]PETSC ERROR: [1] PCSetUp_Cholesky line 88 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/pc/impls/factor/cholesky/cholesky.c<br>
[1]PETSC ERROR: [1] KSPSetUp line 219 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c<br>[1]PETSC ERROR: [1] KSPSolve line 381 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c<br>
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>[1]PETSC ERROR: Signal received<br>[1]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
[1]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014<br>[1]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0000.voltaire0 by esum Wed Aug 27 13:48:51 2014<br>[1]PETSC ERROR: Configure options --prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0 --download-fblaslapack=1 --download-mumps=1 --download-parmetis=1 --download-scalapack --download-metis=1 --with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/<br>
[1]PETSC ERROR: #1 User provided function() line 0 in unknown file<br>[5]PETSC ERROR: ------------------------------------------------------------------------<br></div></div>