[petsc-users] PETSC errors from KSPSolve() with MUMPS

Evan Um evanum at gmail.com
Wed Aug 27 16:27:38 CDT 2014


Dear PETSC users,

I try to solve a large problem (about 9,000,000 unknowns) with large number
of processes (about 400 processes and 1TB). I guess that this is a
reasonably large resource for solving this problem because I was able to
solve the same problem using serial MUMPS with 500GB. Of course, it took
very long to be computed.
The same code was parallelized with PETSC. However, my code with PETSC
suddenly crashes after KSPSolve() successfully calls MUMPS as shown below.
If this problem comes from MUMPS, I expect that MUMPS should produce an
error report (ICNTL(4)=3), but no error report was not generated. Did
anyone have such experience with PETSC+MUMPS? I request comments on its
trouble shooting. In advance, I appreciate your help.

Regards,
Evan

Codes:

KSPCreate(PETSC_COMM_WORLD, &ksp);
KSPSetOperators(ksp, A, A);
KSPSetType (ksp, KSPPREONLY);
KSPGetPC(ksp, &pc);
MatSetOption(A, MAT_SPD, PETSC_TRUE);
PCSetType(pc, PCCHOLESKY);
PCFactorSetMatSolverPackage(pc, MATSOLVERMUMPS);
PCFactorSetUpMatSolverPackage(pc);
PCFactorGetMatrix(pc, &F);
KSPSetType(ksp, KSPCG);
MPI_Barrier(MPI_COMM_WORLD);
icntl=29; ival=2; // ParMetis
MatMumpsSetIcntl(F, icntl, ival);
icntl=4; ival=3; // Errors
MatMumpsSetIcntl(F, icntl, ival);
icntl=23; ival=1500;
MatMumpsSetIcntl(F, icntl, ival);
KSPSolve(ksp,b,x);



Errors:

Entering DMUMPS driver with JOB, N, NZ =   1     9778426              0
 DMUMPS 4.10.0
L D L^T Solver for symmetric positive definite matrices
Type of parallelism: Working host
 ****** ANALYSIS STEP ********
Using ParMETIS for parallel ordering.
Structual symmetry is:100%
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: n0000.voltaire0
PID:  28131
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[n0000.voltaire0:28047] 1 more process has sent help message
help-odls-default.txt / odls-default:could-not-kill
[n0000.voltaire0:28047] Set MCA parameter "orte_base_help_aggregate" to 0
to see all help / error messages
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 11 in communicator MPI_COMM_WORLD
with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR:
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[1]PETSC ERROR:       INSTEAD the line number of the start of the function
[1]PETSC ERROR:       is given.
[1]PETSC ERROR: [1] MatCholeskyFactorSymbolic_MUMPS line 1076
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/impls/aij/mpi/mumps/mumps.c
[1]PETSC ERROR: [1] MatCholeskyFactorSymbolic line 2995
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/interface/matrix.c
[1]PETSC ERROR: [1] PCSetUp_Cholesky line 88
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/pc/impls/factor/cholesky/cholesky.c
[1]PETSC ERROR: [1] KSPSetUp line 219
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c
[1]PETSC ERROR: [1] KSPSolve line 381
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c
[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014
[1]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0000.voltaire0 by
esum Wed Aug 27 13:48:51 2014
[1]PETSC ERROR: Configure options
--prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0
--download-fblaslapack=1 --download-mumps=1 --download-parmetis=1
--download-scalapack --download-metis=1
--with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/
[1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
[5]PETSC ERROR:
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140827/b8be005b/attachment.html>


More information about the petsc-users mailing list