[mpich-discuss] Troobleshooting memory and collective communication problems [Re: mpich-discuss Digest, Vol 2, Issue 32]

François PELLEGRINI francois.pellegrini at labri.fr
Tue Nov 25 19:16:54 CST 2008


Hello all,

> Message: 3
> Date: Mon, 24 Nov 2008 08:54:48 -0600
> From: Dave Goodell <goodell at mcs.anl.gov>
> Subject: Re: [mpich-discuss] Understanding warning/error message ?
> To: mpich-discuss at mcs.anl.gov
> Message-ID: <4B6451D5-AF3B-4D51-88CC-D3F447FB1122 at mcs.anl.gov>
> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
> 
> On Nov 24, 2008, at 8:33 AM, Fran?ois PELLEGRINI wrote:
> 
>> I sometimes have crashes for large number of processes, in
>> MPI_Waitall calls. I am tracking them to know whether they
>> come from my side (most likely), but I also wonder on some

[...]

> Good luck finding your MPI_Waitall crash.  If you can distill your  
> program down to a a very small example program that will elicit the  
> behavior, feel free to send it to us at mpich2-maint@ or mpich2- 
> discuss at .  Also, configuring mpich2 with --enable-error-checking=all  
> might help catch invalid arguments to MPI functions.

Well, to date it is difficult for me to create any
reproducer as the crashes themselves are not systematic,
and seem to depend on message order.

Though, I have some concerns I hope you can help me about.

First : I compiled mpich2-1.0.8 (and later I also tried
with trunk-r3607, but I'll tell you afterwards) with the
following line :

./configure --prefix=$HOME/mpich/ CPPFLAGS='-I/usr/include/valgrind'
--enable-error-checking=all --enable-error-messages=all --enable-g=all
--enable-fast=defopt --enable-debuginfo --enable-mpe --enable-threads=multiple
--with-thread-package=posix --disable-f77

I compiled my programs with "mpicc -mpe=mpicheck", linking
with "-lmpe_collchk".
Then I ran my program on two procs under valgrind.

Below are some parts of the log files produced by several different
runs :

==17938== Syscall param writev(vector[...]) points to uninitialised byte(s)
==17938==    at 0x57AC7CC: writev (in /lib/libc-2.7.so)
==17938==    by 0x4C2E3C: MPIDU_Sock_wait (sock_wait.i:693)
==17938==    by 0x488BF5: MPIDI_CH3I_Progress (ch3_progress.c:192)
==17938==    by 0x454136: MPIC_Wait (helper_fns.c:269)
==17938==    by 0x455395: MPIC_Send (helper_fns.c:38)
==17938==    by 0x442743: MPIR_Bcast (bcast.c:227)
==17938==    by 0x443CB0: PMPI_Bcast (bcast.c:761)
==17938==    by 0x43CA65: CollChk_same_call (same_call.c:36)
==17938==    by 0x43BF41: MPI_Allreduce (allreduce.c:20)
==17938==    by 0x421D84: _SCOTCHdgraphAllreduceMaxSum2 (dgraph_allreduce.c:79)
==17938==    by 0x419E85: _SCOTCHdgraphLoad (dgraph_io_load.c:119)
==17938==    by 0x413AE3: main (dgmap.c:285)
==17938==  Address 0x7fefffc4a is on thread 1's stack
==17938==
==17938== Syscall param write(buf) points to uninitialised byte(s)
==17938==    at 0x54D575B: (within /lib/libpthread-2.7.so)
==17938==    by 0x4C096D: MPIDU_Sock_write (sock_immed.i:525)
==17938==    by 0x4C8D60: MPIDI_CH3_iStartMsg (ch3_istartmsg.c:86)
==17938==    by 0x49B3B4: MPIDI_CH3_EagerContigShortSend (ch3u_eager.c:249)
==17938==    by 0x4A1BF9: MPID_Send (mpid_send.c:115)
==17938==    by 0x455375: MPIC_Send (helper_fns.c:34)
==17938==    by 0x442743: MPIR_Bcast (bcast.c:227)
==17938==    by 0x443CB0: PMPI_Bcast (bcast.c:761)
==17938==    by 0x43E4BA: CollChk_same_op (same_op.c:59)
==17938==    by 0x43BF5B: MPI_Allreduce (allreduce.c:24)
==17938==    by 0x421D84: _SCOTCHdgraphAllreduceMaxSum2 (dgraph_allreduce.c:79)
==17938==    by 0x419E85: _SCOTCHdgraphLoad (dgraph_io_load.c:119)
==17938==  Address 0x7fefff4f0 is on thread 1's stack
==17938==
==17938== Syscall param writev(vector[...]) points to uninitialised byte(s)
==17938==    at 0x57AC7CC: writev (in /lib/libc-2.7.so)
==17938==    by 0x4C1E65: MPIDU_Sock_writev (sock_immed.i:610)
==17938==    by 0x4C994C: MPIDI_CH3_iStartMsgv (ch3_istartmsgv.c:110)
==17938==    by 0x49B60C: MPIDI_CH3_EagerContigSend (ch3u_eager.c:176)
==17938==    by 0x4A1C59: MPID_Send (mpid_send.c:126)
==17938==    by 0x455375: MPIC_Send (helper_fns.c:34)
==17938==    by 0x442743: MPIR_Bcast (bcast.c:227)
==17938==    by 0x443CB0: PMPI_Bcast (bcast.c:761)
==17938==    by 0x43CA65: CollChk_same_call (same_call.c:36)
==17938==    by 0x43C299: MPI_Bcast (bcast.c:19)
==17938==    by 0x41A092: _SCOTCHdgraphLoad (dgraph_io_load.c:226)
==17938==    by 0x413AE3: main (dgmap.c:285)
==17938==  Address 0x7fefffca6 is on thread 1's stack
==17938== ==

Here, I assume that there should be a differenciation in
the memory checker between the sender and receivers sides
when testing the parameters of the Bcast routine: for the
sender, memory has to be initialized beforehand, while for
the reader it is not important that the pointer passed points
to uninitialized data. Is that it ?

In the end, I have the following message :

Collective Checking: ALLREDUCE --> no error
Collective Checking: ALLTOALLV --> Collective call (ALLTOALLV) is Inconsistent
with Rank 0's (ALLREDUCE).
Fatal error in MPI_Comm_call_errhandler: No MPI error[cli_0]: aborting job:
Fatal error in MPI_Comm_call_errhandler: No MPI error
Fatal error in MPI_Comm_call_errhandler: No MPI error[cli_1]: aborting job:
Fatal error in MPI_Comm_call_errhandler: No MPI error

This is a bit of a problem for me because I assumed that in full
debug mode the program would report such collective errors as soon
as they appear...

In a second run, the collective error messages were even more
cryptic (unfortunately, too, the output of the two log streams
were mixed in the log file, but the general idea is here):

Collective Checking: ALLREDUCE --> no error
Fatal error in MPI_Comm_call_errhandler: No MPI error[cli_0]: aborting job:
Fatal error in MPI_Comm_call_errhandler: No MPI error
Collective Checking: ALLTOALLV --> no error
Fatal error in MPI_Comm_call_errhandler: No MPI error[cli_0]: aborting job:
Fatal error in MPI_Comm_call_errhandler: No MPI error
Collective Checking: ALLREDUCE --> Inconsistent operation (MPI_SUM) to Rank 0's
operation().
Fatal error in MPI_Bcast: Message truncated, error stack:
MPI_Bcast(786)....................: MPI_Bcast(buf=0x6e36fb4, count=1, MPI_INT,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...................:
MPIDI_CH3U_Receive_data_found(128): Message from rank 0 and tag 2 truncated; 32
bytes received but buffer size is
4[cli_1]: aborting job:
Fatal error in MPI_Bcast: Message truncated, error stack:
MPI_Bcast(786)....................: MPI_Bcast(buf=0x6e36fb4, count=1, MPI_INT,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...................:
MPIDI_CH3U_Receive_data_found(128): Message from rank 0 and tag 2 truncated; 32
bytes received but buffer size is
4

It reports a "truncated message" in MPI_Bacst, while in Bcast
the size of the buffer is assumed to be the same for all sides,
right ? It also reports an ALLREDUCE inconsistent operation.
Is there a way to have it stop at the very right place and output
a stack trace, or halt into gdb at the proper position ?

Regarding the trunk-r3607, I have even more warnings, even regarding
basic type commiting features:

==8126== Invalid read of size 8
==8126==    at 0x47A9A2: MPIU_trvalid (trmem.c:529)
==8126==    by 0x47B56D: MPIU_trmalloc (trmem.c:220)
==8126==    by 0x49033D: MPID_Datatype_set_contents (mpid_datatype_contents.c:62)
==8126==    by 0x467932: PMPI_Type_contiguous (type_contiguous.c:96)
==8126==    by 0x423211: _SCOTCHdgraphAllreduceMaxSum2 (dgraph_allreduce.c:72)
==8126==    by 0x41B395: _SCOTCHdgraphLoad (dgraph_io_load.c:119)
==8126==    by 0x414FF3: main (dgmap.c:285)
==8126==  Address 0x5a44b58 is 120 bytes inside a block of size 168 alloc'd
==8126==    at 0x4C20FEB: malloc (vg_replace_malloc.c:207)
==8126==    by 0x47B197: MPIU_trmalloc (trmem.c:235)
==8126==    by 0x47EF1D: MPIU_Find_local_and_external (local_proc.c:364)
==8126==    by 0x46206B: MPIR_Comm_commit (commutil.c:258)
==8126==    by 0x4ACD0A: MPID_Init (mpid_init.c:207)
==8126==    by 0x46FCDD: MPIR_Init_thread (initthread.c:368)
==8126==    by 0x470165: PMPI_Init_thread (initthread.c:531)
==8126==    by 0x41493B: main (dgmap.c:141)

==8126== Invalid read of size 8
==8126==    at 0x47A9A2: MPIU_trvalid (trmem.c:529)
==8126==    by 0x47B56D: MPIU_trmalloc (trmem.c:220)
==8126==    by 0x4C2D35: MPID_Dataloop_alloc_and_copy (dataloop.c:380)
==8126==    by 0x4C403B: MPID_Dataloop_create_contiguous
(dataloop_create_contig.c:76)
==8126==    by 0x4C38A6: MPID_Dataloop_create (dataloop_create.c:153)
==8126==    by 0x48E7A9: MPID_Type_commit (mpid_type_commit.c:47)
==8126==    by 0x4656B4: PMPI_Type_commit (type_commit.c:97)
==8126==    by 0x42321D: _SCOTCHdgraphAllreduceMaxSum2 (dgraph_allreduce.c:72)
==8126==    by 0x41B395: _SCOTCHdgraphLoad (dgraph_io_load.c:119)
==8126==    by 0x414FF3: main (dgmap.c:285)
==8126==  Address 0x5a45048 is 120 bytes inside a block of size 192 alloc'd
==8126==    at 0x4C20FEB: malloc (vg_replace_malloc.c:207)
==8126==    by 0x47B197: MPIU_trmalloc (trmem.c:235)
==8126==    by 0x49033D: MPID_Datatype_set_contents (mpid_datatype_contents.c:62)
==8126==    by 0x467932: PMPI_Type_contiguous (type_contiguous.c:96)
==8126==    by 0x423211: _SCOTCHdgraphAllreduceMaxSum2 (dgraph_allreduce.c:72)
==8126==    by 0x41B395: _SCOTCHdgraphLoad (dgraph_io_load.c:119)
==8126==    by 0x414FF3: main (dgmap.c:285)
==28944==
==28944== Invalid read of size 8
==28944==    at 0x47A9A2: MPIU_trvalid (trmem.c:529)
==28944==    by 0x47B56D: MPIU_trmalloc (trmem.c:220)
==28944==    by 0x4DB360: MPID_nem_newtcp_module_connect (socksm.c:773)
==28944==    by 0x4D3CD0: MPID_nem_newtcp_iStartContigMsg
(newtcp_module_send.c:279)
==28944==    by 0x4DF6BA: MPIDI_CH3_iStartMsgv (ch3_istartmsgv.c:52)
==28944==    by 0x4A917E: MPIDI_CH3_EagerContigSend (ch3u_eager.c:187)
==28944==    by 0x4AFB59: MPID_Send (mpid_send.c:128)
==28944==    by 0x457645: MPIC_Send (helper_fns.c:34)
==28944==    by 0x4441C3: MPIR_Bcast (bcast.c:234)
==28944==    by 0x445C03: PMPI_Bcast (bcast.c:846)
==28944==    by 0x43DF75: CollChk_same_call (same_call.c:36)
==28944==    by 0x43D451: MPI_Allreduce (allreduce.c:20)
==28944==  Address 0x5a744a0 is 120 bytes inside a block of size 232 alloc'd
==28944==    at 0x4C20FEB: malloc (vg_replace_malloc.c:207)
==28944==    by 0x47B197: MPIU_trmalloc (trmem.c:235)
==28944==    by 0x4C2D35: MPID_Dataloop_alloc_and_copy (dataloop.c:380)
==28944==    by 0x4C403B: MPID_Dataloop_create_contiguous
(dataloop_create_contig.c:76)
==28944==    by 0x4C38A6: MPID_Dataloop_create (dataloop_create.c:153)
==28944==    by 0x48E7CC: MPID_Type_commit (mpid_type_commit.c:55)
==28944==    by 0x4656B4: PMPI_Type_commit (type_commit.c:97)
==28944==    by 0x42321D: _SCOTCHdgraphAllreduceMaxSum2 (dgraph_allreduce.c:72)
==28944==    by 0x41B395: _SCOTCHdgraphLoad (dgraph_io_load.c:119)
==28944==    by 0x414FF3: main (dgmap.c:285)

You will find below a small reproducer :

--------------------------------------------
file "brol.c"
-------------

#include <stdio.h>
#include <mpi.h>

typedef int Gnum;

#define GNUM_MPI MPI_INT

#define DGRAPHALLREDUCEMAXSUMOP(m,s)
                     \
static
                     \
void
                     \
dgraphAllreduceMaxSumOp##m##_##s (
                     \
const Gnum * const          in,                   /* First operand
                  */ \
Gnum * const                inout,                /* Second and output operand
                  */ \
const int * const           len,                  /* Number of instances ;
should be 1, not used */ \
const MPI_Datatype * const  typedat)              /* MPI datatype ; not used
                  */ \
{
                     \
  int               i;
                     \

                     \
  for (i = 0; i < (m); i ++)                      /* Perform maximum on first
part of data array */ \
    if (in[i] > inout[i])
                     \
      inout[i] = in[i];
                     \

                     \
  for ( ; i < ((m) + (s)); i ++)                  /* Perform sum on second part
of data array */    \
    inout[i] += in[i];
                     \
}

#define dgraphAllreduceMaxSum(rlt,rgt,m,s,comm) dgraphAllreduceMaxSum2 ((rlt),
(rgt), (m) + (s), (MPI_User_function *) (dgraphAllreduceMaxSumOp##m##_##s), (comm))

int
dgraphAllreduceMaxSum2 (
Gnum *                      reduloctab,           /* Pointer to array of local
Gnum data   */
Gnum *                      reduglbtab,           /* Pointer to array of
reduced Gnum data */
int                         redumaxsumnbr,        /* Number of max + sum Gnum
operations   */
MPI_User_function *         redufuncptr,          /* Pointer to operator
function          */
MPI_Comm                    proccomm)             /* Communicator to be used
for reduction */
{
  MPI_Datatype      redutypedat;                  /* Data type for finding best
separator              */
  MPI_Op            reduoperdat;                  /* Handle of MPI operator for
finding best separator */

  if ((MPI_Type_contiguous (redumaxsumnbr, GNUM_MPI, &redutypedat) !=
MPI_SUCCESS) ||
      (MPI_Type_commit (&redutypedat)                              !=
MPI_SUCCESS) ||
      (MPI_Op_create (redufuncptr, 1, &reduoperdat)                !=
MPI_SUCCESS)) {
    fprintf (stderr, "dgraphAllreduceMaxSum: communication error (1)");
    return     (1);
  }

  if (MPI_Allreduce (reduloctab, reduglbtab, 1, redutypedat, reduoperdat,
proccomm) != MPI_SUCCESS) {
    fprintf (stderr, "dgraphAllreduceMaxSum: communication error (2)");
    return  (1);
  }

  if ((MPI_Op_free   (&reduoperdat) != MPI_SUCCESS) ||
      (MPI_Type_free (&redutypedat) != MPI_SUCCESS)) {
    fprintf (stderr, "dgraphAllreduceMaxSum: communication error (3)");
    return  (1);
  }

  return (0);
}

DGRAPHALLREDUCEMAXSUMOP (6, 3)

main (int argc, char ** argv)
{
  Gnum                reduloctab[9];
  Gnum                reduglbtab[9];
  Gnum                pid;

  MPI_Init (&argc, &argv);

  pid = (Gnum) getpid ();

  reduloctab[0] = pid + 0;
  reduloctab[1] = pid + 1;
  reduloctab[2] = pid + 2;
  reduloctab[3] = pid + 3;
  reduloctab[4] = pid + 4;
  reduloctab[5] = pid + 5;
  reduloctab[6] = pid + 6;
  reduloctab[7] = pid + 7;
  reduloctab[8] = pid + 8;

  dgraphAllreduceMaxSum (reduloctab, reduglbtab, 6, 3, MPI_COMM_WORLD);

  printf ("%ld\n", (long) reduglbtab[0]);
  printf ("%ld\n", (long) reduglbtab[0]);
  printf ("%ld\n", (long) reduglbtab[0]);
  printf ("%ld\n", (long) reduglbtab[1]);
  printf ("%ld\n", (long) reduglbtab[2]);
  printf ("%ld\n", (long) reduglbtab[3]);
  printf ("%ld\n", (long) reduglbtab[4]);
  printf ("%ld\n", (long) reduglbtab[5]);
  printf ("%ld\n", (long) reduglbtab[6]);
  printf ("%ld\n", (long) reduglbtab[7]);
  printf ("%ld\n", (long) reduglbtab[8]);

  MPI_Finalize ();

  return (0);
}

--------------------------------------------

Compile with :
mpicc -mpe=mpicheck /tmp/brol.c -o /tmp/brol -lmpe_collchk
mpirun -np 2 valgrind /tmp/brol


Thanks for any hint,



					f.p.



More information about the mpich-discuss mailing list