[petsc-dev] [mpich-discuss] turning off MPI abort messages
Jed Brown
jed at jedbrown.org
Sat Feb 22 14:07:59 CST 2014
Jeff Hammond <jeff.science at gmail.com> writes:
> https://trac.mpich.org/projects/mpich/ticket/2038 has the patches.
Although I thought I once had an account on Trac, it doesn't seem to
know about me any more. Anyway, this patch passes an undefined
abort_str on to MPID_Abort.
char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME];
...
if (!MPIR_CVAR_SUPPRESS_ABORT_MESSAGE)
/* FIXME: This is not internationalized */
MPIU_Snprintf(abort_str, 100, "application called MPI_Abort(%s, %d) - process %d", comm_name, errorcode, comm_ptr->rank);
mpi_errno = MPID_Abort( comm_ptr, mpi_errno, errorcode, abort_str );
==27285== Conditional jump or move depends on uninitialised value(s)
==27285== at 0x56F2AE8: vfprintf (in /usr/lib/libc-2.19.so)
==27285== by 0x56F5630: buffered_vfprintf (in /usr/lib/libc-2.19.so)
==27285== by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so)
==27285== by 0x4E96336: MPIU_Error_printf (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285== by 0x4EC0D93: MPID_Abort (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285== by 0x40795B6: MPI_Abort (in /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0)
==27285== by 0x400808: main (in /home/jed/lang/mpi/a.out)
==27285==
==27285== Syscall param write(buf) points to uninitialised byte(s)
==27285== at 0x5783470: __write_nocancel (in /usr/lib/libc-2.19.so)
==27285== by 0x571E472: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib/libc-2.19.so)
==27285== by 0x571DB32: new_do_write (in /usr/lib/libc-2.19.so)
==27285== by 0x571EA85: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib/libc-2.19.so)
==27285== by 0x56F56C5: buffered_vfprintf (in /usr/lib/libc-2.19.so)
==27285== by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so)
==27285== by 0x4E96336: MPIU_Error_printf (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285== by 0x4EC0D93: MPID_Abort (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285== by 0x40795B6: MPI_Abort (in /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0)
==27285== by 0x400808: main (in /home/jed/lang/mpi/a.out)
==27285== Address 0xffeffd130 is on thread 1's stack
So I fix this:
diff --git i/src/mpi/init/abort.c w/src/mpi/init/abort.c
index f0b4cdc..bb1a63b 100644
--- i/src/mpi/init/abort.c
+++ w/src/mpi/init/abort.c
@@ -74,7 +74,7 @@ int MPI_Abort(MPI_Comm comm, int errorcode)
int mpi_errno = MPI_SUCCESS;
MPID_Comm *comm_ptr = NULL;
/* FIXME: 100 is arbitrary and may not be long enough */
- char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME];
+ char abort_str[100] = "", comm_name[MPI_MAX_OBJECT_NAME];
int len = MPI_MAX_OBJECT_NAME;
MPID_MPI_STATE_DECL(MPID_STATE_MPI_ABORT);
and now I can sort of suppress the output:
$ MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1 ./a.out
$
so it prints a blank line which may not be acceptable if it is producing
a stream, but is otherwise fine. Passing abort_str=NULL is already used
for something else ("internal ABORT"), but the following cleans up the
output.
diff --git i/src/mpid/ch3/src/mpid_abort.c w/src/mpid/ch3/src/mpid_abort.c
index f0877ca..74b8a56 100644
--- i/src/mpid/ch3/src/mpid_abort.c
+++ w/src/mpid/ch3/src/mpid_abort.c
@@ -94,7 +94,7 @@ int MPID_Abort(MPID_Comm * comm, int mpi_errno, int exit_code,
#elif defined(MPIDI_DEV_IMPLEMENTS_ABORT)
MPIDI_CH3I_PMI_Abort(exit_code, error_msg);
#else
- MPIU_Error_printf("%s\n", error_msg);
+ if (error_msg[0]) MPIU_Error_printf("%s\n", error_msg);
fflush(stderr);
#endif
If this is acceptable, a similar change should be applied to the other
devices.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140222/9a78b67a/attachment.sig>
More information about the petsc-dev
mailing list