<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=136562418-15032007><FONT face=Arial
color=#0000ff size=2>If the global array is block distributed along rows only,
and the total number of rows is 2*nprocs, shouldn't the local array size be
2*COLS? The memory buffer allocated below, however, is of size
COLS+1.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=136562418-15032007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=136562418-15032007><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Christina
Patrick<BR><B>Sent:</B> Thursday, March 15, 2007 11:35 AM<BR><B>To:</B>
mpich-discuss-digest@mcs.anl.gov<BR><B>Subject:</B> [MPICH] Problem in using
the MPI_Type_create_darray() API<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV>I have written a MPI Program that goes as below.</DIV>
<DIV><BR>program.c<BR>~~~~~~~</DIV>
<DIV> </DIV>
<DIV>#include "mpi.h"<BR>#include <stdio.h><BR>#include
<stdlib.h><BR>#include <string.h></DIV>
<P>#define ROWS (long)(2 * nprocs)<BR>#define COLS
(long)(4 * nprocs)<BR>#define MPI_ERR_CHECK(error_code) if((error_code) !=
MPI_SUCCESS) {
\<BR>
MPI_Error_string(error_code, string, &len); \
<BR>
fprintf(stderr, "error_code: %s\n", string);
\<BR>
return error_code;
\<BR>
}</P>
<P><BR>int main(int argc, char **argv) {<BR> int *buf = NULL, nprocs =
0, mynod = 0, error_code = 0, len = 0, i = 0, j = 0, provided;<BR> char
string[MPI_MAX_ERROR_STRING], filename[] = "pvfs2:/tmp/pvfs2-fs/TEST";
<BR> MPI_Datatype darray;<BR> MPI_File fh;<BR> MPI_Status
status;</P>
<P> error_code = MPI_Init_thread(&argc, &argv,
MPI_THREAD_MULTIPLE, &provided); MPI_ERR_CHECK(error_code);<BR>
error_code = MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_ERR_CHECK(error_code);<BR> error_code =
MPI_Comm_rank(MPI_COMM_WORLD, &mynod); MPI_ERR_CHECK(error_code); </P>
<P> int array_size[2] = {ROWS, COLS};<BR> int
array_distrib[2] = {MPI_DISTRIBUTE_BLOCK, MPI_DISTRIBUTE_BLOCK};<BR> int
array_dargs[2] = {MPI_DISTRIBUTE_DFLT_DARG,
MPI_DISTRIBUTE_DFLT_DARG};<BR> int array_psizes[2] = {nprocs, 1};
</P>
<P> error_code = MPI_Type_create_darray(nprocs, mynod, 2, array_size,
array_distrib, array_dargs, array_psizes, MPI_ORDER_C, MPI_INT, &darray);
MPI_ERR_CHECK(error_code);<BR> error_code =
MPI_Type_commit(&darray); MPI_ERR_CHECK(error_code); <BR> error_code
= MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL,
&fh); MPI_ERR_CHECK(error_code);<BR> error_code =
MPI_File_set_view(fh, 0, MPI_INT, darray, "native", MPI_INFO_NULL);
MPI_ERR_CHECK(error_code); <BR> buf = (int *) calloc(COLS+1, sizeof
MPI_INT);<BR> if(!buf) { fprintf(stderr, "malloc error\n"); exit(0);
}</P>
<P> for(i = 0; i < ROWS/nprocs; i++) {<BR>
error_code = MPI_File_read_all(fh, buf, COLS, MPI_INT, &status);
MPI_ERR_CHECK(error_code);<BR> }</P>
<DIV> free(buf);<BR> MPI_Finalize();<BR> return
0;<BR>}</DIV>
<DIV>~~~~~~~~<BR> <BR>I have used extremely small buffer sizes in this
example program just to check the validity of the program. My file is 1G
in size and I am just reading a small portion of the file to test the program.
I am getting strange errors while running the program. Sometimes under the
control of a debugger the program will fail in the line where I "free(buf);".
</DIV>
<DIV>2: (gdb) backtrace<BR>0: #0 main (argc=1,
argv=0xbf8f8134) at program.c:44<BR>1: #0 main (argc=1,
argv=0xbf80a044) at program.c:44<BR>2: #0 0x00211ed8 in _int_free
() from /lib/libc.so.6<BR>3: #0 main (argc=1, argv=0xbfaf7b34) at
program.c:44<BR>0-1,3: (gdb) 2: #1 0x0021272b in free ()
from /lib/libc.so.6<BR>2: #2 0x0804b655 in main (argc=1,
argv=0xbfc76cb4) at program.c:43</DIV>
<DIV> </DIV>
<DIV>Somehow I do not think that this problem is related to freeing of the
buffer because there are times when the first collective call
succeeds and then subsequent collective calls fails with a SEGV. When I
debug the program in this case, it shows me that the global variable
"ADIOI_Flatlist" is getting corrupted. Here is where my program faults and the
values of this global link list when it faults: </DIV>
<DIV> </DIV>
<DIV>0-3: (gdb) p ADIOI_Flatlist<BR>0: $1 = (ADIOI_Flatlist_node
*) 0x99657e8<BR>1: $1 = (ADIOI_Flatlist_node *) 0x8e1fbb0<BR>2: $1
= (ADIOI_Flatlist_node *) 0x85bd608<BR>3: $1 = (ADIOI_Flatlist_node *)
0x82ee7c0<BR>0-3: (gdb) p ADIOI_Flatlist->next<BR>0: $2 =
(struct ADIOI_Fl_node *) 0x9986310<BR>1: $2 = (struct ADIOI_Fl_node *)
0x8e3d218<BR>2: $2 = (struct ADIOI_Fl_node *)
0x56 <== This is the
invalid value. Somehow the link list is getting corrupted. Instead of NULL, u
have an invalid value like 0x56 stored in the list. <BR>3: $2 = (struct
ADIOI_Fl_node *) 0x830f280</DIV>
<DIV> </DIV>
<DIV>Initially I felt that the problem is because I am creating a separate I/O
thread. However, when I remove my code changes from the MPICH2 library, I
continue getting the same errors. If somebody out there could help me
out, I would really appreciate it. I have compiled the mpich2 library as
follows: </DIV>
<DIV># ./configure --with-pvfs2=<pvfs2path>
--with-file-system=pvfs2+ufs+nfs --enable-threads=multiple
-prefix=<pathToInstallMpich2> --enable-g=dbg
--enable-debuginfo</DIV>
<DIV>Also my program is compiled using "-ggdb3" option.</DIV>
<P>Thanks,<BR>Christina.<BR> </P></BLOCKQUOTE></BODY></HTML>