[petsc-users] Fatal error in MPI_Allreduce: Error message texts are not available[cli_9]: aborting job:

Matthew Knepley knepley at gmail.com
Fri Aug 5 17:51:49 CDT 2011


On Fri, Aug 5, 2011 at 10:41 PM, Dominik Szczerba <dominik at itis.ethz.ch>wrote:

> I have a 2x6core. My solver works fine only on up to 8 processes,
> above that it always crashes with the below cited error. I did not yet
> valgrind etc. because I am in a desperate need to fix it quickly. I am
> just wondering what can potentially be the culprit.
>

You are getting a SIGQUIT in a function you wrote (if it was a PETSc
function it
would show up in the stack). It looks like the system might be killing your
job.

   Matt


> PS. I am not using MPI_Allreduce anywhere in my code.
>
> Many thanks for any hints,
> Dominik
>
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_9]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_1]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_7]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> INTERNAL ERROR: Invalid error class (66) encountered while returning from
> MPI_Allreduce.  Please file a bug report.  No error stack is available.
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_11]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
> the batch system) has told this process to end
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
>
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] MatAssemblyBegin_MPIAIJ line 462
> src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: [0] MatAssemblyBegin line 4553 src/mat/interface/matrix.c
> [0]PETSC ERROR: [0] User provided functi[2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
> the batch system) has told this process to end
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
>
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [2]PETSC ERROR: [2] VecAssemblyBegin line 157
> src/vec/vec/interface/vector.c
> [2]PETSC ERROR: [2] User provided function line 160
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [INTERNAL ERROR: Invalid error class (66) encountered while returning from
> MPI_Allreduce.  Please file a bug report.  No error stack is available.
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_3]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> [4]PETSC ERROR:
> ------------------------------------------------------------------------
> [4]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
> the batch system) has told this process to end
> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [4]PETSC ERROR: or see
>
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> find memory corruption errors
> [4]PETSC ERROR: likely location of problem given in stack below
> [4]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [4]PETSC ERROR:       INSTEAD the line number of the start of the function
> [4]PETSC ERROR:       is given.
> [4]PETSC ERROR: [4] MatAssemblyBegin_MPIAIJ line 462
> src/mat/impls/aij/mpi/mpiaij.c
> [4]PETSC ERROR: [4] MatAssemblyBegin line 4553 src/mat/interface/matrix.c
> [4]PETSC ERROR: [4] User provided functiINTERNAL ERROR: Invalid error
> class (66) encountered while returning from
> MPI_Allreduce.  Please file a bug report.  No error stack is available.
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_5]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> [6]PETSC ERROR:
> ------------------------------------------------------------------------
> [6]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
> the batch system) has told this process to end
> [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [6]PETSC ERROR: or see
>
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[6]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> find memory corruption errors
> [6]PETSC ERROR: likely location of problem given in stack below
> [6]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [6]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [6]PETSC ERROR:       INSTEAD the line number of the start of the function
> [6]PETSC ERROR:       is given.
> [6]PETSC ERROR: [6] MatAssemblyBegin_MPIAIJ line 462
> src/mat/impls/aij/mpi/mpiaij.c
> [6]PETSC ERROR: [6] MatAssemblyBegin line 4553 src/mat/interface/matrix.c
> [6]PETSC ERROR: [6] User provided functiINTERNAL ERROR: Invalid error
> class (66) encountered while returning from
> MPI_Allreduce.  Please file a bug report.  No error stack is available.
> Fatal error in MPI_Allreduce: Error message texts are not
> available[cli_8]: aborting job:
> Fatal error in MPI_Allreduce: Error message texts are not available
> [10]PETSC ERROR:
> ------------------------------------------------------------------------
> [10]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
> the batch system) has told this process to end
> [10]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [10]PETSC ERROR: or see
>
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[10]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> find memory corruption errors
> [10]PETSC ERROR: likely location of problem given in stack below
> [10]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [10]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [10]PETSC ERROR:       INSTEAD the line number of the start of the function
> [10]PETSC ERROR:       is given.
> [10]PETSC ERROR: [10] VecAssemblyBegin line 157
> src/vec/vec/interface/vector.c
> [10]PETSC ERROR: [10] User provided function line 160
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/on line 294
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [0]PETSC ERROR: [0] User provided function line 627
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
> 13:37:48 CDT 2011
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
> 6 00:35:58 2011
> [0]PETSC ERROR: Libraries linked from
> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
> [0]PETSC ERROR: Configure run at Sat Aug  6 00:02:58 2011
> [0]PETSC ERROR: Config2]PETSC ERROR: [2] User provided function line
> 294 "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [2]PETSC ERROR: [2] User provided function line 627
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [2]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [2]PETSC ERROR: Signal received!
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
> 13:37:48 CDT 2011
> [2]PETSC ERROR: See docs/changes/index.html for recent updates.
> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [2]PETSC ERROR: See docs/index.html for manual pages.
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
> 6 00:35:58 2011
> [2]PETSC ERROR: Libraries linked from
> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
> [2]PETSC ERROR: Configure run at Sat Aug on line 294
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [4]PETSC ERROR: [4] User provided function line 627
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [4]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [4]PETSC ERROR: Signal received!
> [4]PETSC ERROR:
> ------------------------------------------------------------------------
> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
> 13:37:48 CDT 2011
> [4]PETSC ERROR: See docs/changes/index.html for recent updates.
> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [4]PETSC ERROR: See docs/index.html for manual pages.
> [4]PETSC ERROR:
> ------------------------------------------------------------------------
> [4]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
> 6 00:35:58 2011
> [4]PETSC ERROR: Libraries linked from
> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
> [4]PETSC ERROR: Configure run at Sat Aug  6 00:02:58 2011
> [4]PETSC ERROR: Configon line 294
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [6]PETSC ERROR: [6] User provided function line 627
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [6]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [6]PETSC ERROR: Signal received!
> [6]PETSC ERROR:
> ------------------------------------------------------------------------
> [6]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
> 13:37:48 CDT 2011
> [6]PETSC ERROR: See docs/changes/index.html for recent updates.
> [6]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [6]PETSC ERROR: See docs/index.html for manual pages.
> [6]PETSC ERROR:
> ------------------------------------------------------------------------
> [6]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
> 6 00:35:58 2011
> [6]PETSC ERROR: Libraries linked from
> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
> [6]PETSC ERROR: Configure run at Sat Aug  6 00:02:58 2011
> [6]PETSC ERROR: ConfigSM3T4mpi.cxx
> [10]PETSC ERROR: [10] User provided function line 294
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [10]PETSC ERROR: [10] User provided function line 627
> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
> [10]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [10]PETSC ERROR: Signal received!
> [10]PETSC ERROR:
> ------------------------------------------------------------------------
> [10]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
> 13:37:48 CDT 2011
> [10]PETSC ERROR: See docs/changes/index.html for recent updates.
> [10]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [10]PETSC ERROR: See docs/index.html for manual pages.
> [10]PETSC ERROR:
> ------------------------------------------------------------------------
> [10]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
>  6 00:35:58 2011
> [10]PETSC ERROR: Libraries linked from
> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
> [10]PETSC ERRure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8
> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1
> --download-mpich=1 --download-hypre=1 --with-parmetis=1
> --download-parmetis=1 --with-x=0 --with-debugging=1
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0[cli_0]:
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>  6 00:02:58 2011
> [2]PETSC ERROR: Configure options
> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 2[cli_2]:
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 2
> ure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8
> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1
> --download-mpich=1 --download-hypre=1 --with-parmetis=1
> --download-parmetis=1 --with-x=0 --with-debugging=1
> [4]PETSC ERROR:
> ------------------------------------------------------------------------
> [4]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 4[cli_4]:
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 4
> ure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8
> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1
> --download-mpich=1 --download-hypre=1 --with-parmetis=1
> --download-parmetis=1 --with-x=0 --with-debugging=1
> [6]PETSC ERROR:
> ------------------------------------------------------------------------
> [6]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6[cli_6]:
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6
> OR: Configure run at Sat Aug  6 00:02:58 2011
> [10]PETSC ERROR: Configure options
> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
> [10]PETSC ERROR:
> ------------------------------------------------------------------------
> [10]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10[cli_10]:
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10
>



-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110805/ae1cf45f/attachment-0001.htm>


More information about the petsc-users mailing list