[petsc-users] PETSC errors from KSPSolve() with MUMPS

Barry Smith bsmith at mcs.anl.gov
Wed Aug 27 16:44:47 CDT 2014


> MPI_ABORT was invoked on rank 11 in communicator MPI_COMM_WORLD


  Please send ALL the output. In particular since rank 11 seems to have chocked we need to see all the messages from [11] to see what it thinks has gone wrong.

   Barry

On Aug 27, 2014, at 4:27 PM, Evan Um <evanum at gmail.com> wrote:

> Dear PETSC users,
> 
> I try to solve a large problem (about 9,000,000 unknowns) with large number of processes (about 400 processes and 1TB). I guess that this is a reasonably large resource for solving this problem because I was able to solve the same problem using serial MUMPS with 500GB. Of course, it took very long to be computed.
> The same code was parallelized with PETSC. However, my code with PETSC suddenly crashes after KSPSolve() successfully calls MUMPS as shown below. If this problem comes from MUMPS, I expect that MUMPS should produce an error report (ICNTL(4)=3), but no error report was not generated. Did anyone have such experience with PETSC+MUMPS? I request comments on its trouble shooting. In advance, I appreciate your help.
> 
> Regards,
> Evan
> 
> Codes:
> 
> KSPCreate(PETSC_COMM_WORLD, &ksp);
> KSPSetOperators(ksp, A, A);
> KSPSetType (ksp, KSPPREONLY);
> KSPGetPC(ksp, &pc);
> MatSetOption(A, MAT_SPD, PETSC_TRUE);
> PCSetType(pc, PCCHOLESKY);
> PCFactorSetMatSolverPackage(pc, MATSOLVERMUMPS);
> PCFactorSetUpMatSolverPackage(pc);
> PCFactorGetMatrix(pc, &F);
> KSPSetType(ksp, KSPCG);
> MPI_Barrier(MPI_COMM_WORLD);
> icntl=29; ival=2; // ParMetis
> MatMumpsSetIcntl(F, icntl, ival);
> icntl=4; ival=3; // Errors
> MatMumpsSetIcntl(F, icntl, ival);
> icntl=23; ival=1500; 
> MatMumpsSetIcntl(F, icntl, ival);
> KSPSolve(ksp,b,x);
> 
> 
> 
> Errors:
> 
> Entering DMUMPS driver with JOB, N, NZ =   1     9778426              0
>  DMUMPS 4.10.0
> L D L^T Solver for symmetric positive definite matrices
> Type of parallelism: Working host
>  ****** ANALYSIS STEP ********
> Using ParMETIS for parallel ordering.
> Structual symmetry is:100%
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> Host: n0000.voltaire0
> PID:  28131
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> [n0000.voltaire0:28047] 1 more process has sent help message help-odls-default.txt / odls-default:could-not-kill
> [n0000.voltaire0:28047] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 11 in communicator MPI_COMM_WORLD
> with errorcode 59.
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> [1]PETSC ERROR: ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [1]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [1]PETSC ERROR:       INSTEAD the line number of the start of the function
> [1]PETSC ERROR:       is given.
> [1]PETSC ERROR: [1] MatCholeskyFactorSymbolic_MUMPS line 1076 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/impls/aij/mpi/mumps/mumps.c
> [1]PETSC ERROR: [1] MatCholeskyFactorSymbolic line 2995 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/interface/matrix.c
> [1]PETSC ERROR: [1] PCSetUp_Cholesky line 88 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/pc/impls/factor/cholesky/cholesky.c
> [1]PETSC ERROR: [1] KSPSetUp line 219 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: [1] KSPSolve line 381 /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Signal received
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014
> [1]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0000.voltaire0 by esum Wed Aug 27 13:48:51 2014
> [1]PETSC ERROR: Configure options --prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0 --download-fblaslapack=1 --download-mumps=1 --download-parmetis=1 --download-scalapack --download-metis=1 --with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/
> [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> [5]PETSC ERROR: ------------------------------------------------------------------------



More information about the petsc-users mailing list