[MPICH] debug flag

Wei-keng Liao wkliao at ece.northwestern.edu
Wed May 30 13:28:45 CDT 2007


The code file is attached. The command-line arguments are
   "filename npx npy npz", where filename is the name of output file, and
   npx, npy, npz are the number of processes along X-Y-Z dimensions.

The XYZ dimensions in each subarray is fixed to 50 x 50 x 50. Each array 
element is a double type. The global array size is hence proportional to 
the number of processes. There is a fourth dimension of size 11, but not 
partitioned.

To repeat my experiment on 4000 processes, please use
   npx=20, npy=20, npz=10

For 2000 processes, change npy to 10.

Wei-keng



On Wed, 30 May 2007, Rajeev Thakur wrote:

>> I have written a short C code for this I/O pattern. ...
>> Let me know if you would like a copy of it.
>
> Of course!
>
>
>> -----Original Message-----
>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>> Sent: Wednesday, May 30, 2007 11:49 AM
>> To: Rajeev Thakur
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: RE: [MPICH] debug flag
>>
>>
>> I just got the results by disabling aggregation. The coredump was
>> generated by rank 784 (out of 4000) and indicates the following info.
>>
>> ad_aggregate.c:242
>>      proc = -603978814     <-- !?
>>      off = -166212992      <-- !?
>>      min_st_offset = 0
>>      fd_len = 400
>>      fd_size = 262582  <-- should be 11000000
>>
>> going up one level to ad_write_coll.c:170
>>      below is some of variables set by
>> ADIOI_Calc_my_off_len() at line 101
>>      count 1375000
>>      offset = 0
>>      start_offset = 407601600
>>      end_offset = -2149678961    <-- should be 839993999
>>      contig_access_count = 27500
>>
>> I suspect the file type is not flatten correctly.
>>
>> I have written a short C code for this I/O pattern. I ran it on 4000
>> processes and it produced the same error. On 2000 processes, it ran
>> fine, just like my program. Let me know if you would like a
>> copy of it.
>>
>> Wei-keng
>>
>>
>> On Tue, 29 May 2007, Rajeev Thakur wrote:
>>
>>> Can you try disabling aggregation and see if the error
>> still remains. You
>>> can disable it by creating an info object as follows and
>> passing it to
>>> File_set_view
>>>     MPI_Info_set(info, "cb_config_list", "*:*");
>>>
>>> Rajeev
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
>>>> Sent: Tuesday, May 29, 2007 1:33 AM
>>>> To: Howard Pritchard
>>>> Cc: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [MPICH] debug flag
>>>>
>>>> Howard,
>>>>
>>>> Thanks for this information. It is very helpful. I was able
>>>> to find more
>>>> details by using the debug built mpich. Below is what I found
>>>> from the
>>>> coredump that may help debugging the ROMIO source.
>>>>
>>>> 1. coredump is from MPI rank 2919. I allocated 4000 MPI processes,
>>>>     (2000 nodes, each nodes has 2 CPUs). I am checking
>>>> mpich2-1.0.2 source.
>>>>
>>>> 2. MPI_Abort() is called at line 97 by function
>>>> ADIOI_Calc_aggregator(),
>>>>     in file ad_aggregate.c where
>>>>     rank_index = 5335, fd->hints->cb_nodes = 2000, off =
>> 2802007600,
>>>>     min_off = 0, fd_size = 525164 (fd_size should be 11000000))
>>>>
>>>> 3. It is function ADIOI_Calc_my_req() called
>>>> ADIOI_Calc_aggregator() at
>>>>     line 240, from file ad_aggregate.c, where
>>>>     i = 0 in loop    for (i=0; i < contig_access_count; i++)
>>>>     off = 2802007600, min_st_offset = 0, fd_len = 400,
>>>> fd_size = 525164
>>>>     (fd_size should be 11000000)
>>>>
>>>> 4. It is function ADIOI_GEN_WriteStridedColl() called
>>>> ADIOI_Calc_my_req()
>>>>     at line 170, from file ad_write_coll.c
>>>>     I would like to see what went wrong with fd_size from function
>>>>     ADIOI_Calc_file_domains() where fd_size is set and saw
>>>> that fd_size is
>>>>     determined by st_offsets[] and end_offsets[] which depend
>>>> on variables
>>>>     start_offset and end_offset.
>>>>
>>>>     So, I went a few line up and checked the values for variables
>>>>     start_offset and end_offset. They were set by
>>>> ADIOI_Calc_my_off_len()
>>>>     at line 101 and I found the value of end_offset must be wrong!
>>>>     end_offset should always >= start_offset, but the core
>> shows that
>>>>         start_offset = 2802007600, end_offset = 244727039
>>>>
>>>>     So, I looked into ADIOI_Calc_my_off_len() in
>>>> ad_read_coll.c and checked
>>>>     variable end_offset_ptr which was set by variable
>>>> end_offset at line
>>>>     453, since filetype_size > 0 and filetype_is_contig == 0.
>>>>     Hence, the only place end_offset is set is at line 420:
>>>>         end_offset = off + frd_size - 1;
>>>>     end_offset is determined by off and frd_size. However, frd_size
>>>>     is declared as an integer. But end_offset is ADIO_Offset. Maybe
>>>>     it is an type overflow! At line 351, I can see a type cast
>>>>         frd_size = (int) (disp + flat_file->indices[i] + ...
>>>>
>>>>     Something fishy here. Unfortunately, the coredump does
>>>> not cover here.
>>>>     Look like an interactive debugging with a break point
>>>> cannot be avoided.
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>> On Mon, 28 May 2007, Howard Pritchard wrote:
>>>>
>>>>> Hello Wei-keng,
>>>>>
>>>>> Here is a way on xt/qk systems to compile with the debug
>>>> mpich2 library:
>>>>>
>>>>> 1) do
>>>>>   module show xt-mpt
>>>>>
>>>>>   to see which mpich2 the system manager has made the default.
>>>>>
>>>>>   For instance, on an internal system here at cray this
>>>> command shows:
>>>>>
>>>>>
>> -------------------------------------------------------------------
>>>>> /opt/modulefiles/xt-mpt/1.5.49:
>>>>>
>>>>> setenv           MPT_DIR /opt/xt-mpt/1.5.49
>>>>> setenv           MPICHBASEDIR /opt/xt-mpt/1.5.49/mpich2-64
>>>>> setenv           MPICH_DIR /opt/xt-mpt/1.5.49/mpich2-64/P2
>>>>> setenv           MPICH_DIR_FTN_DEFAULT64
>>>> /opt/xt-mpt/1.5.49/mpich2-64/P2W
>>>>> prepend-path     LD_LIBRARY_PATH
>> /opt/xt-mpt/1.5.49/mpich2-64/P2/lib
>>>>> prepend-path     PATH /opt/xt-mpt/1.5.49/mpich2-64/P2/bin
>>>>> prepend-path     MANPATH /opt/xt-mpt/1.5.49/mpich2-64/man
>>>>> prepend-path     MANPATH /opt/xt-mpt/1.5.49/romio/man
>>>>> prepend-path     PE_PRODUCT_LIST MPT
>>>>>
>> -------------------------------------------------------------------
>>>>>
>>>>> The debug library you want to use is thus going to be
>>>> picked up by the
>>>>> mpicc installed at:
>>>>>
>>>>> /opt/xt-mpt/1.5.49/mpich2-64/P2DB
>>>>>
>>>>> 2) Now with the cray compiler scripts like cc, ftn, etc.
>>>> you specify the
>>>>> alternate location to use for compiling/linking by
>>>>>
>>>>> cc -driverpath=/opt/xt-mpt/1.5.49/mpich2-64/P2DB/bin -o
>>>> a.out.debug ......
>>>>>
>>>>> or whichever path is appropriate for the xt-mpt installed
>>>> on your system.
>>>>>
>>>>> 3) When you rerun the binary, you may want to set the MPICH_DBMASK
>>>>> environment variable to 0x200.
>>>>>
>>>>> I am pretty sure you are running out of memory, based on
>> the area in
>>>>> the ADIO_Calc_my_req where the error arises.  Clearly this
>>>> is not a very
>>>>> good way to report an oom condition.  I'll investigate.
>>>>>
>>>>> You may be able to save some memory by tweaking the environment
>>>>> variables controlling mpi buffer space.  Refer to the
>>>> intro_mpi man page
>>>>> on your xt/qk system.
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Howard
>>>>>
>>>>> Wei-keng Liao wrote:
>>>>>
>>>>>>
>>>>>> Well, I am aware of mpich2version, but unforturnately that
>>>> command is not
>>>>>> available to users on that machine. The only commands
>>>> avaliable to me are
>>>>>> mpicc, mpif77, mpif90, and mpicxx.
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>>
>>>>>> On Fri, 25 May 2007, Anthony Chan wrote:
>>>>>>
>>>>>>>
>>>>>>> <mpich2-install-dir>/bin/mpich2version may show if
>>>> --enable-g is set.
>>>>>>>
>>>>>>> A.Chan
>>>>>>>
>>>>>>> On Fri, 25 May 2007, Wei-keng Liao wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> The problem is that I cannot run my own mpich on the
>>>> machine. I can see
>>>>>>>> the MPICH I am using is of version 2-1.0.2 from peeking
>>>> at mpif90 script.
>>>>>>>> Is there a way to know if it is built using
>>>> --enable-g=dbg option from
>>>>>>>> the
>>>>>>>> mpif90 script?
>>>>>>>>
>>>>>>>> I don't know if this help, but below is the whole
>> error message:
>>>>>>>>
>>>>>>>> aborting job:
>>>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process <id>
>>>>>>>> (there are 4000 lines, each with a distinct id number)
>>>>>>>>
>>>>>>>> ----- DEBUG: PCB, CONTEXT, STACK TRACE ---------------------
>>>>>>>>
>>>>>>>> PROCESSOR [ 0]
>>>>>>>> log_nid  =    15  phys_nid  = 0x98  host_id =   7691
>>>> host_pid  = 18545
>>>>>>>> group_id = 12003  num_procs = 4000  rank    =     15
>>>> local_pid =    3
>>>>>>>> base_node_index =    0   last_node_index = 1999
>>>>>>>>
>>>>>>>> text_base  = 0x00000000200000   text_len  = 0x00000000400000
>>>>>>>> data_base  = 0x00000000600000   data_len  = 0x00000000a00000
>>>>>>>> stack_base = 0x000000fec00000   stack_len = 0x00000001000000
>>>>>>>> heap_base  = 0x00000001200000   heap_len  = 0x0000007b000000
>>>>>>>>
>>>>>>>> ss  = 0x000000000000001f  fs  = 000000000000000000  gs  =
>>>>>>>> 0x0000000000000017
>>>>>>>> rip = 0x00000000002d46fe
>>>>>>>> rdi = 0x0000000006133a90  rsi = 0xffffffffdc0003c2  rbp =
>>>>>>>> 0x00000000ffbf9d40
>>>>>>>> rsp = 0x00000000ffbf9cc0  rbx = 0x0000000000000190  rdx =
>>>>>>>> 0x000000003eb08c39
>>>>>>>> rcx = 0x0000000008ea18b0  rax = 0x0000000008ecff30  cs  =
>>>>>>>> 0x000000000000001f
>>>>>>>> R8  = 0x0000000007ad2ab0  R9  = 0xfffffffffffffe0c  R10 =
>>>>>>>> 0x0000000008e6bd30
>>>>>>>> R11 = 0x0000000000000262  R12 = 0x0000000000000a8c  R13 =
>>>>>>>> 0xfffffffff0538770
>>>>>>>> R14 = 0x00000000fffffe0c  R15 = 0x0000000008ed3dc0
>>>>>>>> rflg = 0x0000000000010206   prev_sp = 0x00000000ffbf9cc0
>>>>>>>> error_code = 6
>>>>>>>>
>>>>>>>> SIGNAL #[11][Segmentation fault]  fault_address =
>>>> 0xffffffff78ed4cc8
>>>>>>>>   0xffbf9cc0  0x        ffbf9cf0 0x             fa0 0x
>>>>     a00006b6c 0x
>>>>>>>> a8c3e9ab7ff
>>>>>>>>   0xffbf9ce0  0x         8ed7c50 0x             7d0 0x
>>>>             0 0x
>>>>>>>> 6b6c002d455b
>>>>>>>>   0xffbf9d00  0x         8ea18b0 0x         8e6bd30 0x
>>>>       61338a0 0x
>>>>>>>> fa0
>>>>>>>>   0xffbf9d20  0x               0 0x         61338a0 0x
>>>>         8036c 0x
>>>>>>>> 8ec4390
>>>>>>>>   0xffbf9d40  0x        ffbf9e80 0x          2d2280 0x
>>>>       8ecff30 0x
>>>>>>>> 8036c
>>>>>>>>   0xffbf9d60  0x             fa0 0x        ffbf9de4 0x
>>>>      ffbf9de8 0x
>>>>>>>> ffbf9df0
>>>>>>>>   0xffbf9d80  0x        ffbf9df8 0x               0 0x
>>>>             0 0x
>>>>>>>> 8ebc680
>>>>>>>>   0xffbf9da0  0x            1770 0x     7d000a39f88 0x
>>>>             0 0x
>>>>>>>> 650048174f
>>>>>>>>   0xffbf9dc0  0x  14fb184c000829 0x         6a93500 0x
>>>>      ffbf9e30 0x
>>>>>>>> 292e54
>>>>>>>>   0xffbf9de0  0x               0 0x         8ed3dc0 0x
>>>>           7af 0x
>>>>>>>> 0
>>>>>>>>   0xffbf9e00  0x               0 0x         8ecc0a0 0x
>>>>       8ecff30 0x
>>>>>>>> 8036c
>>>>>>>>   0xffbf9e20  0x       100000014 0x         8e6bd30 0x
>>>>       8ea18b0 0x
>>>>>>>> 1770
>>>>>>>>   0xffbf9e40  0xffffffff6793163f 0x    6b6c00a39fa8 0x
>>>>   fa00000000f 0x
>>>>>>>> 61338a0
>>>>>>>>   0xffbf9e60  0x        4c000829 0x          14fb18 0x
>>>>            65 0x
>>>>>>>> 0
>>>>>>>>   0xffbf9e80  0x        ffbf9ee0 0x          2a397c 0x
>>>>        866b60 0x
>>>>>>>> ffbf9eb0
>>>>>>>>
>>>>>>>>
>>>>>>>> Stack Trace:  ------------------------------
>>>>>>>> #0  0x00000000002d46fe in ADIOI_Calc_my_req()
>>>>>>>> #1  0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>>>>>>>> #2  0x00000000002a397c in MPIOI_File_write_all()
>>>>>>>> #3  0x00000000002a3a4a in PMPI_File_write_all()
>>>>>>>> #4  0x00000000002913a8 in pmpi_file_write_all_()
>>>>>>>> could not find symbol for addr 0x73696e6966204f49
>>>>>>>> --------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 25 May 2007, Robert Latham wrote:
>>>>>>>>
>>>>>>>>> On Fri, May 25, 2007 at 03:56:16PM -0500, Wei-keng Liao wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have an MPI I/O application that runs fine up to
>>>> 1000 processes, but
>>>>>>>>>> failed when using 4000 processes. Parts of error message are
>>>>>>>>>>     ...
>>>>>>>>>>     Stack Trace:  ------------------------------
>>>>>>>>>>     #0  0x00000000002d46fe in ADIOI_Calc_my_req()
>>>>>>>>>>     #1  0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>>>>>>>>>>     #2  0x00000000002a397c in MPIOI_File_write_all()
>>>>>>>>>>     #3  0x00000000002a3a4a in PMPI_File_write_all()
>>>>>>>>>>     #4  0x00000000002913a8 in pmpi_file_write_all_()
>>>>>>>>>>     could not find symbol for addr 0x73696e6966204f49
>>>>>>>>>>     aborting job:
>>>>>>>>>>     application called MPI_Abort(MPI_COMM_WORLD, 1) -
>>>> process 1456
>>>>>>>>>>     ...
>>>>>>>>>>
>>>>>>>>>> My question is what debug flags should I use for
>>>> compiling and running
>>>>>>>>>> in
>>>>>>>>>> order to help find what exact location in function
>>>> ADIOI_Calc_my_req()
>>>>>>>>>> causes this error?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Wei-keng
>>>>>>>>>
>>>>>>>>> If you build MPICH2 with --enable-g=dbg, then all of
>>>> MPI will be built
>>>>>>>>> with debugging symbols.   Be sure to 'make clean'
>>>> first: the ROMIO
>>>>>>>>> objects might not rebuild otherwise.
>>>>>>>>>
>>>>>>>>> I wonder what caused the abort?  maybe ADIOI_Malloc
>>>> failed to allocate
>>>>>>>>> memory?  Well, a stack trace with debugging symbols should be
>>>>>>>>> interesting.
>>>>>>>>>
>>>>>>>>> ==rob
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Rob Latham
>>>>>>>>> Mathematics and Computer Science Division    A215 0178
>>>> EA2D B059 8CDF
>>>>>>>>> Argonne National Lab, IL USA                 B29D F333
>>>> 664A 4280 315B
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
    int          i, err, rank, np, buf_size, debug;
    double      *buf;
    int          np_dim[3], rank_dim[3], array_of_sizes[3];
    int          array_of_subsizes[3], array_of_starts[3];
    MPI_File     fh;
    MPI_Datatype ftype;
    MPI_Status   status;
    MPI_Info     info;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    if (argc != 5) {
        fprintf(stderr,"Usage: %s filename npx npy npz\n",argv[0]);
	MPI_Finalize();
	exit(1);
    }

    debug = 0;

    for (i=0; i<3; i++) {
        np_dim[i] = atoi(argv[2+i]);        /* no. processes in each dim */
        array_of_sizes[i] = 50 * np_dim[i]; /* global 3D array size */
        array_of_subsizes[i] = 50;          /* sub array size is fixed */
    }
    if (debug) {
        printf("%d: np_dim = %d %d %d\n",rank,np_dim[0],np_dim[1],np_dim[2]);
        printf("%d: array_of_sizes = %d %d %d\n",rank,array_of_sizes[0],
               array_of_sizes[1],array_of_sizes[2]);
        printf("%d: array_of_subsizes = %d %d %d\n",rank,array_of_subsizes[0],
               array_of_subsizes[1],array_of_subsizes[2]);
    }

    /* check if number of processes is matched */
    if (np != np_dim[0]*np_dim[1]*np_dim[2]) {
        fprintf(stderr,"Error: process number mismatch ");
        fprintf(stderr,"npx(%d) npy(%d) npz(%d) total(%d)\n",
                np_dim[0],np_dim[1],np_dim[2],np);
	MPI_Finalize();
	exit(1);
    }

    /* process rank in each dimension */
    rank_dim[0] =  rank %  np_dim[0];
    rank_dim[1] = (rank /  np_dim[0]) % np_dim[1];
    rank_dim[2] =  rank / (np_dim[0]  * np_dim[1]);

    /* starting coordinates of the subarray in each dimension */
    for (i=0; i<3; i++)
        array_of_starts[i] = 50 * rank_dim[i];

    if (debug) {
        printf("%d: rank_dim = %d %d %d\n",rank,rank_dim[0],rank_dim[1],
               rank_dim[2]);
        printf("%d: array_of_starts = %d %d %d\n",rank,array_of_starts[0],
               array_of_starts[1],array_of_starts[2]);
    }

    /* create file type */
    MPI_Type_create_subarray(3, array_of_sizes, array_of_subsizes,
                             array_of_starts, MPI_ORDER_FORTRAN,
                             MPI_DOUBLE, &ftype) ;
    MPI_Type_commit(&ftype);

    /* create MPI I/O hint */
    MPI_Info_create(&info);
    MPI_Info_set(info, "romio_no_indep_rw", "true");
/* disable I/O aggregation */
/*
    MPI_Info_set(info, "cb_config_list",    "*:*");
*/

    /* open the file */
    err = MPI_File_open(MPI_COMM_WORLD, argv[1],
                        MPI_MODE_CREATE | MPI_MODE_WRONLY,
                        info, &fh);
    if (err != MPI_SUCCESS) {
        printf("Error: MPI_File_open() filename %s\n",argv[1]);
	MPI_Abort(MPI_COMM_WORLD, -1);
	exit(1);
    }

    /* set the file view */
    MPI_File_set_view(fh, 0, MPI_DOUBLE, ftype, "native", MPI_INFO_NULL);

    /* prepare write buffer */
    buf_size = 11; /* fourth dimension, not partitioned */
    for (i=0; i<3; i++)
        buf_size *= array_of_subsizes[i];
    buf = (double*) malloc(buf_size*sizeof(double));
    if (debug) 
        printf("%d: buf_size = %d bytes\n", rank,buf_size*sizeof(double));

    /* MPI collective write */
    MPI_File_write_all(fh, buf, buf_size, MPI_DOUBLE, &status);

    MPI_File_close(&fh);

    free(buf);
    MPI_Info_free(&info);
    MPI_Finalize();
    return 0;
}




More information about the mpich-discuss mailing list