[petsc-users] Fatal error in MPI_Allreduce: Error message texts are not available[cli_9]: aborting job:

Dominik Szczerba dominik at itis.ethz.ch
Sat Aug 6 00:05:29 CDT 2011


On Sat, Aug 6, 2011 at 4:12 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>  Does the PETSc example src/vec/vec/examples/tutorials/ex1.c run correctly on 8+ processes?

yes, as per:

dsz at nexo:~/pack/petsc-3.1-p8/src/vec/vec/examples/tutorials$
~/pack/petsc-3.1-p8/externalpackages/mpich2-1.0.8/bin/mpiexec -np 12
./ex1
Vector length 20
Vector length 20 40 60
All other values should be near zero
VecScale 0
VecCopy  0
VecAXPY 0
VecAYPX 0
VecSwap  0
VecSwap  0
VecWAXPY 0
VecPointwiseMult 0
VecPointwiseDivide 0
VecMAXPY 0 0 0

>  Are you sure the MPI shared libraries are the same on both systems?

I was not precise, I have only one system consisting of two 6core
Intels. 12 cores in total.
I have openmpi installed alongside, but was explicitly calling mpiexec
from petsc external packages.

>   You can try the option -on_error_attach_debugger

When run with np 12 It only opens 6 windows, saying:

[9]PETSC ERROR: MPI error 14
[1]PETSC ERROR: MPI error 14
[7]PETSC ERROR: MPI error 14
[9]PETSC ERROR: PETSC: Attaching gdb to
/home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11798 on
display localhost:11.0 on machine nexo
[1]PETSC ERROR: PETSC: Attaching gdb to
/home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11790 on
display localhost:11.0 on machine nexo
[7]PETSC ERROR: PETSC: Attaching gdb to
/home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11796 on
display localhost:11.0 on machine nexo
[9]PETSC ERROR: PetscGatherNumberOfMessages() line 62 in
src/sys/utils/mpimesg.c
[1]PETSC ERROR: PetscGatherNumberOfMessages() line 62 in
src/sys/utils/mpimesg.c
[7]PETSC ERROR: PetscGatherNumberOfMessages() line 62 in
src/sys/utils/mpimesg.c
[1]PETSC ERROR: PETSC: Attaching gdb to
/home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11790 on
display localhost:11.0 on machine nexo
[9]PETSC ERROR: PETSC: Attaching gdb to
/home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11798 on
display localhost:11.0 on machine nexo
[7]PETSC ERROR: PETSC: Attaching gdb to
/home/dsz/build/framework-debug/trunk/bin/sm3t4mpi of pid 11796 on
display localhost:11.0 on machine nexo

When now starting the program in the 6 windows with its expected args
results in:

[cli_9]: PMIU_parse_keyvals: unexpected key delimiter at character 54 in cmd
[cli_9]: parse_kevals failed -1

I will not be able to do proper valgrinding/puryfying before next
week. In the meantime I will still appreciate any hints.

Regards,
Dominik

>
>
>  Barry
>
> On Aug 5, 2011, at 4:41 PM, Dominik Szczerba wrote:
>
>> I have a 2x6core. My solver works fine only on up to 8 processes,
>> above that it always crashes with the below cited error. I did not yet
>> valgrind etc. because I am in a desperate need to fix it quickly. I am
>> just wondering what can potentially be the culprit.
>>
>> PS. I am not using MPI_Allreduce anywhere in my code.
>>
>> Many thanks for any hints,
>> Dominik
>>
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_9]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_1]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_7]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> INTERNAL ERROR: Invalid error class (66) encountered while returning from
>> MPI_Allreduce.  Please file a bug report.  No error stack is available.
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_11]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
>> the batch system) has told this process to end
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors
>> [0]PETSC ERROR: likely location of problem given in stack below
>> [0]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [0]PETSC ERROR:       is given.
>> [0]PETSC ERROR: [0] MatAssemblyBegin_MPIAIJ line 462
>> src/mat/impls/aij/mpi/mpiaij.c
>> [0]PETSC ERROR: [0] MatAssemblyBegin line 4553 src/mat/interface/matrix.c
>> [0]PETSC ERROR: [0] User provided functi[2]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [2]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
>> the batch system) has told this process to end
>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [2]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors
>> [2]PETSC ERROR: likely location of problem given in stack below
>> [2]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [2]PETSC ERROR:       is given.
>> [2]PETSC ERROR: [2] VecAssemblyBegin line 157 src/vec/vec/interface/vector.c
>> [2]PETSC ERROR: [2] User provided function line 160
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [INTERNAL ERROR: Invalid error class (66) encountered while returning from
>> MPI_Allreduce.  Please file a bug report.  No error stack is available.
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_3]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
>> the batch system) has told this process to end
>> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [4]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[4]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors
>> [4]PETSC ERROR: likely location of problem given in stack below
>> [4]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [4]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [4]PETSC ERROR:       is given.
>> [4]PETSC ERROR: [4] MatAssemblyBegin_MPIAIJ line 462
>> src/mat/impls/aij/mpi/mpiaij.c
>> [4]PETSC ERROR: [4] MatAssemblyBegin line 4553 src/mat/interface/matrix.c
>> [4]PETSC ERROR: [4] User provided functiINTERNAL ERROR: Invalid error
>> class (66) encountered while returning from
>> MPI_Allreduce.  Please file a bug report.  No error stack is available.
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_5]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> [6]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [6]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
>> the batch system) has told this process to end
>> [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [6]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[6]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors
>> [6]PETSC ERROR: likely location of problem given in stack below
>> [6]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [6]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [6]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [6]PETSC ERROR:       is given.
>> [6]PETSC ERROR: [6] MatAssemblyBegin_MPIAIJ line 462
>> src/mat/impls/aij/mpi/mpiaij.c
>> [6]PETSC ERROR: [6] MatAssemblyBegin line 4553 src/mat/interface/matrix.c
>> [6]PETSC ERROR: [6] User provided functiINTERNAL ERROR: Invalid error
>> class (66) encountered while returning from
>> MPI_Allreduce.  Please file a bug report.  No error stack is available.
>> Fatal error in MPI_Allreduce: Error message texts are not
>> available[cli_8]: aborting job:
>> Fatal error in MPI_Allreduce: Error message texts are not available
>> [10]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [10]PETSC ERROR: Caught signal number 3 Quit: Some other process (or
>> the batch system) has told this process to end
>> [10]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [10]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[10]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors
>> [10]PETSC ERROR: likely location of problem given in stack below
>> [10]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [10]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [10]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [10]PETSC ERROR:       is given.
>> [10]PETSC ERROR: [10] VecAssemblyBegin line 157 src/vec/vec/interface/vector.c
>> [10]PETSC ERROR: [10] User provided function line 160
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/on line 294
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [0]PETSC ERROR: [0] User provided function line 627
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [0]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [0]PETSC ERROR: Signal received!
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>> 13:37:48 CDT 2011
>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [0]PETSC ERROR: See docs/index.html for manual pages.
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
>> 6 00:35:58 2011
>> [0]PETSC ERROR: Libraries linked from
>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>> [0]PETSC ERROR: Configure run at Sat Aug  6 00:02:58 2011
>> [0]PETSC ERROR: Config2]PETSC ERROR: [2] User provided function line
>> 294 "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [2]PETSC ERROR: [2] User provided function line 627
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [2]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [2]PETSC ERROR: Signal received!
>> [2]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>> 13:37:48 CDT 2011
>> [2]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [2]PETSC ERROR: See docs/index.html for manual pages.
>> [2]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [2]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
>> 6 00:35:58 2011
>> [2]PETSC ERROR: Libraries linked from
>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>> [2]PETSC ERROR: Configure run at Sat Aug on line 294
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [4]PETSC ERROR: [4] User provided function line 627
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [4]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [4]PETSC ERROR: Signal received!
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>> 13:37:48 CDT 2011
>> [4]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [4]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [4]PETSC ERROR: See docs/index.html for manual pages.
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
>> 6 00:35:58 2011
>> [4]PETSC ERROR: Libraries linked from
>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>> [4]PETSC ERROR: Configure run at Sat Aug  6 00:02:58 2011
>> [4]PETSC ERROR: Configon line 294
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [6]PETSC ERROR: [6] User provided function line 627
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [6]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [6]PETSC ERROR: Signal received!
>> [6]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [6]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>> 13:37:48 CDT 2011
>> [6]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [6]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [6]PETSC ERROR: See docs/index.html for manual pages.
>> [6]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [6]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
>> 6 00:35:58 2011
>> [6]PETSC ERROR: Libraries linked from
>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>> [6]PETSC ERROR: Configure run at Sat Aug  6 00:02:58 2011
>> [6]PETSC ERROR: ConfigSM3T4mpi.cxx
>> [10]PETSC ERROR: [10] User provided function line 294
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [10]PETSC ERROR: [10] User provided function line 627
>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/SM3T4mpi.cxx
>> [10]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [10]PETSC ERROR: Signal received!
>> [10]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [10]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17
>> 13:37:48 CDT 2011
>> [10]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [10]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [10]PETSC ERROR: See docs/index.html for manual pages.
>> [10]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [10]PETSC ERROR: Unknown Name on a linux-gnu named nexo by dsz Sat Aug
>> 6 00:35:58 2011
>> [10]PETSC ERROR: Libraries linked from
>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib
>> [10]PETSC ERRure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8
>> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1
>> --download-mpich=1 --download-hypre=1 --with-parmetis=1
>> --download-parmetis=1 --with-x=0 --with-debugging=1
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: User provided function() line 0 in unknown directory
>> unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0[cli_0]:
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>> 6 00:02:58 2011
>> [2]PETSC ERROR: Configure options
>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
>> [2]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [2]PETSC ERROR: User provided function() line 0 in unknown directory
>> unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 2[cli_2]:
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 2
>> ure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8
>> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1
>> --download-mpich=1 --download-hypre=1 --with-parmetis=1
>> --download-parmetis=1 --with-x=0 --with-debugging=1
>> [4]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [4]PETSC ERROR: User provided function() line 0 in unknown directory
>> unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 4[cli_4]:
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 4
>> ure options PETSC_DIR=/home/dsz/pack/petsc-3.1-p8
>> PETSC_ARCH=linux-gnu-c-debug --download-f-blas-lapack=1
>> --download-mpich=1 --download-hypre=1 --with-parmetis=1
>> --download-parmetis=1 --with-x=0 --with-debugging=1
>> [6]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [6]PETSC ERROR: User provided function() line 0 in unknown directory
>> unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6[cli_6]:
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6
>> OR: Configure run at Sat Aug  6 00:02:58 2011
>> [10]PETSC ERROR: Configure options
>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug
>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1
>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1
>> [10]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [10]PETSC ERROR: User provided function() line 0 in unknown directory
>> unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10[cli_10]:
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10
>
>


More information about the petsc-users mailing list