[MPICH] debug flag
William Gropp
gropp at mcs.anl.gov
Mon May 28 09:01:21 CDT 2007
I know that this doesn't help you with this problem, but the next
release of MPICH2 will include some of this information in the object
library itself so that it is *always* available (this is where such
data belongs, for just this reason).
Bill
On May 25, 2007, at 5:18 PM, Wei-keng Liao wrote:
>
> Well, I am aware of mpich2version, but unforturnately that command
> is not available to users on that machine. The only commands
> avaliable to me are
> mpicc, mpif77, mpif90, and mpicxx.
>
> Wei-keng
>
>
> On Fri, 25 May 2007, Anthony Chan wrote:
>
>>
>> <mpich2-install-dir>/bin/mpich2version may show if --enable-g is set.
>>
>> A.Chan
>>
>> On Fri, 25 May 2007, Wei-keng Liao wrote:
>>
>>>
>>> The problem is that I cannot run my own mpich on the machine. I
>>> can see
>>> the MPICH I am using is of version 2-1.0.2 from peeking at mpif90
>>> script.
>>> Is there a way to know if it is built using --enable-g=dbg option
>>> from the
>>> mpif90 script?
>>>
>>> I don't know if this help, but below is the whole error message:
>>>
>>> aborting job:
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process <id>
>>> (there are 4000 lines, each with a distinct id number)
>>>
>>> ----- DEBUG: PCB, CONTEXT, STACK TRACE ---------------------
>>>
>>> PROCESSOR [ 0]
>>> log_nid = 15 phys_nid = 0x98 host_id = 7691 host_pid =
>>> 18545
>>> group_id = 12003 num_procs = 4000 rank = 15 local_pid
>>> = 3
>>> base_node_index = 0 last_node_index = 1999
>>>
>>> text_base = 0x00000000200000 text_len = 0x00000000400000
>>> data_base = 0x00000000600000 data_len = 0x00000000a00000
>>> stack_base = 0x000000fec00000 stack_len = 0x00000001000000
>>> heap_base = 0x00000001200000 heap_len = 0x0000007b000000
>>>
>>> ss = 0x000000000000001f fs = 000000000000000000 gs =
>>> 0x0000000000000017
>>> rip = 0x00000000002d46fe
>>> rdi = 0x0000000006133a90 rsi = 0xffffffffdc0003c2 rbp =
>>> 0x00000000ffbf9d40
>>> rsp = 0x00000000ffbf9cc0 rbx = 0x0000000000000190 rdx =
>>> 0x000000003eb08c39
>>> rcx = 0x0000000008ea18b0 rax = 0x0000000008ecff30 cs =
>>> 0x000000000000001f
>>> R8 = 0x0000000007ad2ab0 R9 = 0xfffffffffffffe0c R10 =
>>> 0x0000000008e6bd30
>>> R11 = 0x0000000000000262 R12 = 0x0000000000000a8c R13 =
>>> 0xfffffffff0538770
>>> R14 = 0x00000000fffffe0c R15 = 0x0000000008ed3dc0
>>> rflg = 0x0000000000010206 prev_sp = 0x00000000ffbf9cc0
>>> error_code = 6
>>>
>>> SIGNAL #[11][Segmentation fault] fault_address = 0xffffffff78ed4cc8
>>> 0xffbf9cc0 0x ffbf9cf0 0x fa0 0x
>>> a00006b6c 0x a8c3e9ab7ff
>>> 0xffbf9ce0 0x 8ed7c50 0x 7d0
>>> 0x 0 0x 6b6c002d455b
>>> 0xffbf9d00 0x 8ea18b0 0x 8e6bd30 0x
>>> 61338a0 0x fa0
>>> 0xffbf9d20 0x 0 0x 61338a0 0x
>>> 8036c 0x 8ec4390
>>> 0xffbf9d40 0x ffbf9e80 0x 2d2280 0x
>>> 8ecff30 0x 8036c
>>> 0xffbf9d60 0x fa0 0x ffbf9de4 0x
>>> ffbf9de8 0x ffbf9df0
>>> 0xffbf9d80 0x ffbf9df8 0x 0
>>> 0x 0 0x 8ebc680
>>> 0xffbf9da0 0x 1770 0x 7d000a39f88
>>> 0x 0 0x 650048174f
>>> 0xffbf9dc0 0x 14fb184c000829 0x 6a93500 0x
>>> ffbf9e30 0x 292e54
>>> 0xffbf9de0 0x 0 0x 8ed3dc0
>>> 0x 7af 0x 0
>>> 0xffbf9e00 0x 0 0x 8ecc0a0 0x
>>> 8ecff30 0x 8036c
>>> 0xffbf9e20 0x 100000014 0x 8e6bd30 0x
>>> 8ea18b0 0x 1770
>>> 0xffbf9e40 0xffffffff6793163f 0x 6b6c00a39fa8 0x
>>> fa00000000f 0x 61338a0
>>> 0xffbf9e60 0x 4c000829 0x 14fb18
>>> 0x 65 0x 0
>>> 0xffbf9e80 0x ffbf9ee0 0x 2a397c 0x
>>> 866b60 0x ffbf9eb0
>>>
>>>
>>> Stack Trace: ------------------------------
>>> #0 0x00000000002d46fe in ADIOI_Calc_my_req()
>>> #1 0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>>> #2 0x00000000002a397c in MPIOI_File_write_all()
>>> #3 0x00000000002a3a4a in PMPI_File_write_all()
>>> #4 0x00000000002913a8 in pmpi_file_write_all_()
>>> could not find symbol for addr 0x73696e6966204f49
>>> --------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 25 May 2007, Robert Latham wrote:
>>>
>>>> On Fri, May 25, 2007 at 03:56:16PM -0500, Wei-keng Liao wrote:
>>>>>
>>>>> I have an MPI I/O application that runs fine up to 1000
>>>>> processes, but
>>>>> failed when using 4000 processes. Parts of error message are
>>>>> ...
>>>>> Stack Trace: ------------------------------
>>>>> #0 0x00000000002d46fe in ADIOI_Calc_my_req()
>>>>> #1 0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>>>>> #2 0x00000000002a397c in MPIOI_File_write_all()
>>>>> #3 0x00000000002a3a4a in PMPI_File_write_all()
>>>>> #4 0x00000000002913a8 in pmpi_file_write_all_()
>>>>> could not find symbol for addr 0x73696e6966204f49
>>>>> aborting job:
>>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1456
>>>>> ...
>>>>>
>>>>> My question is what debug flags should I use for compiling and
>>>>> running in
>>>>> order to help find what exact location in function
>>>>> ADIOI_Calc_my_req()
>>>>> causes this error?
>>>>
>>>> Hi Wei-keng
>>>>
>>>> If you build MPICH2 with --enable-g=dbg, then all of MPI will be
>>>> built
>>>> with debugging symbols. Be sure to 'make clean' first: the ROMIO
>>>> objects might not rebuild otherwise.
>>>>
>>>> I wonder what caused the abort? maybe ADIOI_Malloc failed to
>>>> allocate
>>>> memory? Well, a stack trace with debugging symbols should be
>>>> interesting.
>>>>
>>>> ==rob
>>>>
>>>> --
>>>> Rob Latham
>>>> Mathematics and Computer Science Division A215 0178 EA2D B059
>>>> 8CDF
>>>> Argonne National Lab, IL USA B29D F333 664A 4280
>>>> 315B
>>>>
>>>
>>>
>>
>
More information about the mpich-discuss
mailing list