[MPICH] Problem in using the MPI_Type_create_darray() API

Christina Patrick christina.subscribes at gmail.com
Thu Mar 15 11:34:32 CDT 2007


I have written a MPI Program that goes as below.

program.c
~~~~~~~

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ROWS   (long)(2 * nprocs)
#define COLS   (long)(4 * nprocs)
#define MPI_ERR_CHECK(error_code) if((error_code) != MPI_SUCCESS) { \
                                    MPI_Error_string(error_code, string,
&len); \
                                    fprintf(stderr, "error_code: %s\n",
string); \
                                    return error_code; \
                                  }


int main(int argc, char **argv) {
  int *buf = NULL, nprocs = 0, mynod = 0, error_code = 0, len = 0, i = 0, j
= 0, provided;
  char string[MPI_MAX_ERROR_STRING], filename[] =
"pvfs2:/tmp/pvfs2-fs/TEST";
  MPI_Datatype darray;
  MPI_File fh;
  MPI_Status status;

  error_code = MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE,
&provided); MPI_ERR_CHECK(error_code);
  error_code = MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_ERR_CHECK(error_code);
  error_code = MPI_Comm_rank(MPI_COMM_WORLD, &mynod);
MPI_ERR_CHECK(error_code);

  int array_size[2]    = {ROWS, COLS};
  int array_distrib[2] = {MPI_DISTRIBUTE_BLOCK, MPI_DISTRIBUTE_BLOCK};
  int array_dargs[2]   = {MPI_DISTRIBUTE_DFLT_DARG,
MPI_DISTRIBUTE_DFLT_DARG};
  int array_psizes[2]  = {nprocs, 1};

  error_code = MPI_Type_create_darray(nprocs, mynod, 2, array_size,
array_distrib, array_dargs, array_psizes, MPI_ORDER_C, MPI_INT, &darray);
MPI_ERR_CHECK(error_code);
  error_code = MPI_Type_commit(&darray); MPI_ERR_CHECK(error_code);
  error_code = MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY,
MPI_INFO_NULL, &fh); MPI_ERR_CHECK(error_code);
  error_code = MPI_File_set_view(fh, 0, MPI_INT, darray, "native",
MPI_INFO_NULL); MPI_ERR_CHECK(error_code);
  buf = (int *) calloc(COLS+1, sizeof MPI_INT);
  if(!buf) { fprintf(stderr, "malloc error\n"); exit(0); }

  for(i = 0; i < ROWS/nprocs; i++) {
    error_code = MPI_File_read_all(fh, buf, COLS, MPI_INT, &status);
MPI_ERR_CHECK(error_code);
  }
  free(buf);
  MPI_Finalize();
  return 0;
}
~~~~~~~~

I have used extremely small buffer sizes in this example program just to
check the validity of the program. My file is 1G in size and I am just
reading a small portion of the file to test the program. I am getting
strange errors while running the program. Sometimes under the control of a
debugger the program will fail in the line where I "free(buf);".
2:  (gdb) backtrace
0:  #0  main (argc=1, argv=0xbf8f8134) at program.c:44
1:  #0  main (argc=1, argv=0xbf80a044) at program.c:44
2:  #0  0x00211ed8 in _int_free () from /lib/libc.so.6
3:  #0  main (argc=1, argv=0xbfaf7b34) at program.c:44
0-1,3:  (gdb) 2:  #1  0x0021272b in free () from /lib/libc.so.6
2:  #2  0x0804b655 in main (argc=1, argv=0xbfc76cb4) at program.c:43

Somehow I do not think that this problem is related to freeing of the buffer
because there are times when the first collective call succeeds and then
subsequent collective calls fails with a SEGV. When I debug the program in
this case, it shows me that the global variable "ADIOI_Flatlist" is getting
corrupted. Here is where my program faults and the values of this global
link list when it faults:

0-3:  (gdb) p ADIOI_Flatlist
0:  $1 = (ADIOI_Flatlist_node *) 0x99657e8
1:  $1 = (ADIOI_Flatlist_node *) 0x8e1fbb0
2:  $1 = (ADIOI_Flatlist_node *) 0x85bd608
3:  $1 = (ADIOI_Flatlist_node *) 0x82ee7c0
0-3:  (gdb) p ADIOI_Flatlist->next
0:  $2 = (struct ADIOI_Fl_node *) 0x9986310
1:  $2 = (struct ADIOI_Fl_node *) 0x8e3d218
2:  $2 = (struct ADIOI_Fl_node *) 0x56          <== This is the invalid
value. Somehow the link list is getting corrupted. Instead of NULL, u have
an invalid value like 0x56 stored in the list.
3:  $2 = (struct ADIOI_Fl_node *) 0x830f280

Initially I felt that the problem is because I am creating a separate I/O
thread. However, when I remove my code changes from the MPICH2 library, I
continue getting the same errors. If somebody out there could help me out, I
would really appreciate it. I have compiled the mpich2 library as follows:
# ./configure --with-pvfs2=<pvfs2path> --with-file-system=pvfs2+ufs+nfs
--enable-threads=multiple -prefix=<pathToInstallMpich2> --enable-g=dbg
--enable-debuginfo
Also my program is compiled using "-ggdb3" option.

Thanks,
Christina.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070315/eb8f16a4/attachment.htm>


More information about the mpich-discuss mailing list