[MPICH] debug flag
Wei-keng Liao
wkliao at ece.northwestern.edu
Fri May 25 16:53:08 CDT 2007
The problem is that I cannot run my own mpich on the machine. I can see
the MPICH I am using is of version 2-1.0.2 from peeking at mpif90 script.
Is there a way to know if it is built using --enable-g=dbg option from the
mpif90 script?
I don't know if this help, but below is the whole error message:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process <id>
(there are 4000 lines, each with a distinct id number)
----- DEBUG: PCB, CONTEXT, STACK TRACE ---------------------
PROCESSOR [ 0]
log_nid = 15 phys_nid = 0x98 host_id = 7691 host_pid = 18545
group_id = 12003 num_procs = 4000 rank = 15 local_pid = 3
base_node_index = 0 last_node_index = 1999
text_base = 0x00000000200000 text_len = 0x00000000400000
data_base = 0x00000000600000 data_len = 0x00000000a00000
stack_base = 0x000000fec00000 stack_len = 0x00000001000000
heap_base = 0x00000001200000 heap_len = 0x0000007b000000
ss = 0x000000000000001f fs = 000000000000000000 gs = 0x0000000000000017
rip = 0x00000000002d46fe
rdi = 0x0000000006133a90 rsi = 0xffffffffdc0003c2 rbp = 0x00000000ffbf9d40
rsp = 0x00000000ffbf9cc0 rbx = 0x0000000000000190 rdx = 0x000000003eb08c39
rcx = 0x0000000008ea18b0 rax = 0x0000000008ecff30 cs = 0x000000000000001f
R8 = 0x0000000007ad2ab0 R9 = 0xfffffffffffffe0c R10 = 0x0000000008e6bd30
R11 = 0x0000000000000262 R12 = 0x0000000000000a8c R13 = 0xfffffffff0538770
R14 = 0x00000000fffffe0c R15 = 0x0000000008ed3dc0
rflg = 0x0000000000010206 prev_sp = 0x00000000ffbf9cc0
error_code = 6
SIGNAL #[11][Segmentation fault] fault_address = 0xffffffff78ed4cc8
0xffbf9cc0 0x ffbf9cf0 0x fa0 0x a00006b6c 0x a8c3e9ab7ff
0xffbf9ce0 0x 8ed7c50 0x 7d0 0x 0 0x 6b6c002d455b
0xffbf9d00 0x 8ea18b0 0x 8e6bd30 0x 61338a0 0x fa0
0xffbf9d20 0x 0 0x 61338a0 0x 8036c 0x 8ec4390
0xffbf9d40 0x ffbf9e80 0x 2d2280 0x 8ecff30 0x 8036c
0xffbf9d60 0x fa0 0x ffbf9de4 0x ffbf9de8 0x ffbf9df0
0xffbf9d80 0x ffbf9df8 0x 0 0x 0 0x 8ebc680
0xffbf9da0 0x 1770 0x 7d000a39f88 0x 0 0x 650048174f
0xffbf9dc0 0x 14fb184c000829 0x 6a93500 0x ffbf9e30 0x 292e54
0xffbf9de0 0x 0 0x 8ed3dc0 0x 7af 0x 0
0xffbf9e00 0x 0 0x 8ecc0a0 0x 8ecff30 0x 8036c
0xffbf9e20 0x 100000014 0x 8e6bd30 0x 8ea18b0 0x 1770
0xffbf9e40 0xffffffff6793163f 0x 6b6c00a39fa8 0x fa00000000f 0x 61338a0
0xffbf9e60 0x 4c000829 0x 14fb18 0x 65 0x 0
0xffbf9e80 0x ffbf9ee0 0x 2a397c 0x 866b60 0x ffbf9eb0
Stack Trace: ------------------------------
#0 0x00000000002d46fe in ADIOI_Calc_my_req()
#1 0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
#2 0x00000000002a397c in MPIOI_File_write_all()
#3 0x00000000002a3a4a in PMPI_File_write_all()
#4 0x00000000002913a8 in pmpi_file_write_all_()
could not find symbol for addr 0x73696e6966204f49
--------------------------------------------
On Fri, 25 May 2007, Robert Latham wrote:
> On Fri, May 25, 2007 at 03:56:16PM -0500, Wei-keng Liao wrote:
>>
>> I have an MPI I/O application that runs fine up to 1000 processes, but
>> failed when using 4000 processes. Parts of error message are
>> ...
>> Stack Trace: ------------------------------
>> #0 0x00000000002d46fe in ADIOI_Calc_my_req()
>> #1 0x00000000002d2280 in ADIOI_GEN_WriteStridedColl()
>> #2 0x00000000002a397c in MPIOI_File_write_all()
>> #3 0x00000000002a3a4a in PMPI_File_write_all()
>> #4 0x00000000002913a8 in pmpi_file_write_all_()
>> could not find symbol for addr 0x73696e6966204f49
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1456
>> ...
>>
>> My question is what debug flags should I use for compiling and running in
>> order to help find what exact location in function ADIOI_Calc_my_req()
>> causes this error?
>
> Hi Wei-keng
>
> If you build MPICH2 with --enable-g=dbg, then all of MPI will be built
> with debugging symbols. Be sure to 'make clean' first: the ROMIO
> objects might not rebuild otherwise.
>
> I wonder what caused the abort? maybe ADIOI_Malloc failed to allocate
> memory? Well, a stack trace with debugging symbols should be
> interesting.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
> Argonne National Lab, IL USA B29D F333 664A 4280 315B
>
More information about the mpich-discuss
mailing list