[MPICH] debug flag

William Gropp gropp at mcs.anl.gov
Mon May 28 09:01:21 CDT 2007


I know that this doesn't help you with this problem, but the next  
release of MPICH2 will include some of this information in the object  
library itself so that it is *always* available (this is where such  
data belongs, for just this reason).

Bill

On May 25, 2007, at 5:18 PM, Wei-keng Liao wrote:

>
> Well, I am aware of mpich2version, but unforturnately that command  
> is not available to users on that machine. The only commands  
> avaliable to me are
> mpicc, mpif77, mpif90, and mpicxx.
>
> Wei-keng
>
>
> On Fri, 25 May 2007, Anthony Chan wrote:
>
>>
>> <mpich2-install-dir>/bin/mpich2version may show if --enable-g is set.
>>
>> A.Chan
>>
>> On Fri, 25 May 2007, Wei-keng Liao wrote:
>>
>>>
>>> The problem is that I cannot run my own mpich on the machine. I  
>>> can see
>>> the MPICH I am using is of version 2-1.0.2 from peeking at mpif90  
>>> script.
>>> Is there a way to know if it is built using --enable-g=dbg option  
>>> from the
>>> mpif90 script?
>>>
>>> I don't know if this help, but below is the whole error message:
>>>
>>> aborting job:
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process <id>
>>> (there are 4000 lines, each with a distinct id number)
>>>
>>> ----- DEBUG: PCB, CONTEXT, STACK TRACE ---------------------
>>>
>>> PROCESSOR [ 0]
>>> log_nid  =    15  phys_nid  = 0x98  host_id =   7691  host_pid  =  
>>> 18545
>>> group_id = 12003  num_procs = 4000  rank    =     15  local_pid  
>>> =    3
>>> base_node_index =    0   last_node_index = 1999
>>>
>>> text_base  = 0x00000000200000   text_len  = 0x00000000400000
>>> data_base  = 0x00000000600000   data_len  = 0x00000000a00000
>>> stack_base = 0x000000fec00000   stack_len = 0x00000001000000
>>> heap_base  = 0x00000001200000   heap_len  = 0x0000007b000000
>>>
>>> ss  = 0x000000000000001f  fs  = 000000000000000000  gs  =  
>>> 0x0000000000000017
>>> rip = 0x00000000002d46fe
>>> rdi = 0x0000000006133a90  rsi = 0xffffffffdc0003c2  rbp =  
>>> 0x00000000ffbf9d40
>>> rsp = 0x00000000ffbf9cc0  rbx = 0x0000000000000190  rdx =  
>>> 0x000000003eb08c39
>>> rcx = 0x0000000008ea18b0  rax = 0x0000000008ecff30  cs  =  
>>> 0x000000000000001f
>>> R8  = 0x0000000007ad2ab0  R9  = 0xfffffffffffffe0c  R10 =  
>>> 0x0000000008e6bd30
>>> R11 = 0x0000000000000262  R12 = 0x0000000000000a8c  R13 =  
>>> 0xfffffffff0538770
>>> R14 = 0x00000000fffffe0c  R15 = 0x0000000008ed3dc0
>>> rflg = 0x0000000000010206   prev_sp = 0x00000000ffbf9cc0
>>> error_code = 6
>>>
>>> SIGNAL #[11][Segmentation fault]  fault_address = 0xffffffff78ed4cc8
>>>   0xffbf9cc0  0x        ffbf9cf0 0x             fa0 0x        
>>> a00006b6c 0x     a8c3e9ab7ff
>>>   0xffbf9ce0  0x         8ed7c50 0x             7d0  
>>> 0x               0 0x    6b6c002d455b
>>>   0xffbf9d00  0x         8ea18b0 0x         8e6bd30 0x          
>>> 61338a0 0x             fa0
>>>   0xffbf9d20  0x               0 0x         61338a0 0x            
>>> 8036c 0x         8ec4390
>>>   0xffbf9d40  0x        ffbf9e80 0x          2d2280 0x          
>>> 8ecff30 0x           8036c
>>>   0xffbf9d60  0x             fa0 0x        ffbf9de4 0x         
>>> ffbf9de8 0x        ffbf9df0
>>>   0xffbf9d80  0x        ffbf9df8 0x               0  
>>> 0x               0 0x         8ebc680
>>>   0xffbf9da0  0x            1770 0x     7d000a39f88  
>>> 0x               0 0x      650048174f
>>>   0xffbf9dc0  0x  14fb184c000829 0x         6a93500 0x         
>>> ffbf9e30 0x          292e54
>>>   0xffbf9de0  0x               0 0x         8ed3dc0  
>>> 0x             7af 0x               0
>>>   0xffbf9e00  0x               0 0x         8ecc0a0 0x          
>>> 8ecff30 0x           8036c
>>>   0xffbf9e20  0x       100000014 0x         8e6bd30 0x          
>>> 8ea18b0 0x            1770
>>>   0xffbf9e40  0xffffffff6793163f 0x    6b6c00a39fa8 0x      
>>> fa00000000f 0x         61338a0
>>>   0xffbf9e60  0x        4c000829 0x          14fb18  
>>> 0x              65 0x               0
>>>   0xffbf9e80  0x        ffbf9ee0 0x          2a397c 0x           
>>> 866b60 0x        ffbf9eb0
>>>
>>>
>>> Stack Trace:  ------------------------------
>>> #0  0x00000000002d46fe in ADIOI_Calc_my_req()
>>> #1  0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>>> #2  0x00000000002a397c in MPIOI_File_write_all()
>>> #3  0x00000000002a3a4a in PMPI_File_write_all()
>>> #4  0x00000000002913a8 in pmpi_file_write_all_()
>>> could not find symbol for addr 0x73696e6966204f49
>>> --------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 25 May 2007, Robert Latham wrote:
>>>
>>>> On Fri, May 25, 2007 at 03:56:16PM -0500, Wei-keng Liao wrote:
>>>>>
>>>>> I have an MPI I/O application that runs fine up to 1000  
>>>>> processes, but
>>>>> failed when using 4000 processes. Parts of error message are
>>>>>     ...
>>>>>     Stack Trace:  ------------------------------
>>>>>     #0  0x00000000002d46fe in ADIOI_Calc_my_req()
>>>>>     #1  0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>>>>>     #2  0x00000000002a397c in MPIOI_File_write_all()
>>>>>     #3  0x00000000002a3a4a in PMPI_File_write_all()
>>>>>     #4  0x00000000002913a8 in pmpi_file_write_all_()
>>>>>     could not find symbol for addr 0x73696e6966204f49
>>>>>     aborting job:
>>>>>     application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1456
>>>>>     ...
>>>>>
>>>>> My question is what debug flags should I use for compiling and  
>>>>> running in
>>>>> order to help find what exact location in function  
>>>>> ADIOI_Calc_my_req()
>>>>> causes this error?
>>>>
>>>> Hi Wei-keng
>>>>
>>>> If you build MPICH2 with --enable-g=dbg, then all of MPI will be  
>>>> built
>>>> with debugging symbols.   Be sure to 'make clean' first: the ROMIO
>>>> objects might not rebuild otherwise.
>>>>
>>>> I wonder what caused the abort?  maybe ADIOI_Malloc failed to  
>>>> allocate
>>>> memory?  Well, a stack trace with debugging symbols should be
>>>> interesting.
>>>>
>>>> ==rob
>>>>
>>>> --
>>>> Rob Latham
>>>> Mathematics and Computer Science Division    A215 0178 EA2D B059  
>>>> 8CDF
>>>> Argonne National Lab, IL USA                 B29D F333 664A 4280  
>>>> 315B
>>>>
>>>
>>>
>>
>




More information about the mpich-discuss mailing list