[mpich2-dev] [mvapich-discuss] Need a hint in debugging a problem that only affects a few machines in our cluster.

Krishna Chaitanya Kandalla kandalla at cse.ohio-state.edu
Wed Jul 15 15:13:12 CDT 2009


Mike,
           Thank you for providing the source code. I am able to 
reproduce the hang on our cluster, as well. I will look into the issue.

Thanks,
Krishna

Mike Heinz wrote:
> I was wondering about that - I passed the parameter in a param file, using the -param argument to mpirun_rsh. I just tried passing it inline as well, here are the results:
>
> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 /opt/iba/src/mpi_apps/bandwidth/bw 10 10
>
> node 0
>
> Loaded symbols for /lib64/libnss_files.so.2
> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress at plt ()
>    from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1
> (gdb) where
> #0  0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress at plt ()
>    from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1
> #1  0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1)
>     at ch3_progress.c:174
> #2  0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4,
>     datatype=1275068673, source=1, tag=101, comm=1140850688, status=0x601520)
>     at recv.c:156
> #3  0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91
>
>
> (gdb) where
> #0  0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1,
>     wc=0x7fffb9786a60) at src/cq.c:470
> #1  0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll (
>     vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, is_blocking=1)
>     at /usr/include/infiniband/verbs.h:934
> #2  0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00,
>     vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468
> #3  0x00002b9af14a8304 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffb9786b80,
>     v_ptr=0x7fffb9786b78, is_blocking=<value optimized out>)
>     at ch3_read_progress.c:158
> #4  0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1,
>     state=<value optimized out>) at ch3_progress.c:202
> #5  0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at helper_fns.c:269
> #6  0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0,
>     sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0,
>     recvtype=1275068685, source=0, recvtag=1, comm=1140850688, status=0x1)
>     at helper_fns.c:125
> #7  0x00002b9af149b07a in MPIR_Barrier (comm_ptr=<value optimized out>)
>     at barrier.c:82
> #8  0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at barrier.c:446
> #9  0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81
>
> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets out of MPI_Init() in this case, but then one side is waiting at a barrier while the other has already gone past the barrier. I've attached a copy of the program.
>
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
> -----Original Message-----
> From: Krishna Chaitanya Kandalla [mailto:kandalla at cse.ohio-state.edu] 
> Sent: Wednesday, July 15, 2009 3:42 PM
> To: Mike Heinz
> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster.
>
> Mike,
> Thats a little surprising. Setting this variable off ensures that a 
> particular flag is set to 0. This flag is supposed to guard the piece of 
> code that does the 2-level communicator creation. Just out of curiosity, 
> can you also let me know the command that you are using to launch the 
> job. The env variables need to be set before the executable is 
> specified. If MV2_USE_SHMEM_COLL=0 appears after the executable name, 
> the job launcher might not pick it up.
>
> Thanks,
> Krishna
>
>
>
>
> Mike Heinz wrote:
>   
>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL to 
>> zero did not seem to change the stack trace much:
>>
>> Node 0:
>>
>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll 
>> (vbuf_handle=0x7fffcb46d698,
>>
>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529
>>
>> 529 for (; i < rdma_num_hcas; ++i) {
>>
>> (gdb) where
>>
>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll (
>>
>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1)
>>
>> at ibv_channel_manager.c:529
>>
>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fffcb46d6a0,
>>
>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143
>>
>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1,
>>
>> state=<value optimized out>) at ch3_progress.c:202
>>
>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800)
>>
>> at helper_fns.c:269
>>
>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, sendcount=2,
>>
>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, recvcount=2,
>>
>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688,
>>
>> status=0x7fffcb46d820) at helper_fns.c:125
>>
>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=<value optimized out>,
>>
>> sendcount=<value optimized out>, sendtype=1275069445, recvbuf=0x10993a80,
>>
>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0)
>>
>> at allgather.c:192
>>
>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>
>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2,
>>
>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>
>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0,
>>
>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196
>>
>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2,
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>
>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, 
>> argv=0x7fffcb46db30)
>>
>> at init.c:146
>>
>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27
>>
>> Node 1:
>>
>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48,
>>
>> is_blocking=1) at ch3_read_progress.c:143
>>
>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking);
>>
>> (gdb) where
>>
>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48,
>>
>> is_blocking=1) at ch3_read_progress.c:143
>>
>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1,
>>
>> state=<value optimized out>) at ch3_progress.c:202
>>
>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0)
>>
>> at helper_fns.c:269
>>
>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2,
>>
>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4,
>>
>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688,
>>
>> status=0x7fff0b10bcd0) at helper_fns.c:125
>>
>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=<value optimized out>,
>>
>> sendcount=<value optimized out>, sendtype=1275069445, recvbuf=0xf77020,
>>
>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80)
>>
>> at allgather.c:192
>>
>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>
>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2,
>>
>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>
>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, key=0,
>>
>> newcomm=0x2afc9fd26d94) at comm_split.c:196
>>
>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2,
>>
>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>
>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, 
>> argv=0x7fff0b10bfe0)
>>
>> at init.c:146
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27
>>
>> Any suggestions would be appreciated.
>>
>> --
>>
>> Michael Heinz
>>
>> Principal Engineer, Qlogic Corporation
>>
>> King of Prussia, Pennsylvania
>>
>> *From:* kris.c1986 at gmail.com [mailto:kris.c1986 at gmail.com] *On Behalf 
>> Of *Krishna Chaitanya
>> *Sent:* Tuesday, July 14, 2009 6:39 PM
>> *To:* Mike Heinz
>> *Cc:* Todd Rimmer; mvapich-discuss at cse.ohio-state.edu; 
>> mpich2-dev at mcs.anl.gov
>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging 
>> a problem that only affects a few machines in our cluster.
>>
>> Mike,
>> The hang seems to be occuring when the MPI library is trying to create 
>> the 2-level communicator, during the init phase. Can you try running 
>> the test with MV2_USE_SHMEM_COLL 
>> <http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-16000011.74>=0. 
>> This will ensure that a flat communicator is used for the subsequent 
>> MPI calls. This might help us isolate the problem.
>>
>> Thanks,
>> Krishna
>>
>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz <michael.heinz at qlogic.com 
>> <mailto:michael.heinz at qlogic.com>> wrote:
>>
>> We're having a very odd problem with our fabric, where, out of the 
>> entire cluster, machine "A" can't run mvapich2 programs with machine 
>> "B", and machine "C" can't run programs with machine "D" - even though 
>> "A" can run with "D" and "B" can run with "C" - and the rest of the 
>> fabric works fine.
>>
>> 1) There are no IB errors anywhere on the fabric that I can find, and 
>> the machines in question all work correctly with mvapich1 and 
>> low-level IB tests.
>>
>> 2) The problem occurs whether using mpd or rsh.
>>
>> 3) If I attach to the running processes, both machines appear to be 
>> waiting for a read operation to complete. (See below)
>>
>> Can anyone make a suggestion on how to debug this?
>>
>> Stack trace for node 0:
>>
>> #0 0x000000361160abb5 in pthread_spin_lock () from /lib64/libpthread.so.0
>>
>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1,
>>
>> wc=0x7fff9d835900) at src/cq.c:468
>>
>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll (
>>
>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1)
>>
>> at /usr/include/infiniband/verbs.h:934
>>
>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fff9d8359e0,
>>
>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143
>>
>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1,
>>
>> state=<value optimized out>) at ch3_progress.c:202
>>
>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800)
>>
>> at helper_fns.c:269
>>
>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2,
>>
>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2,
>>
>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688,
>>
>> status=0x7fff9d835b60) at helper_fns.c:125
>>
>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=<value optimized out>,
>>
>> sendcount=<value optimized out>, sendtype=1275069445, recvbuf=0x217fc50,
>>
>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0)
>>
>> at allgather.c:192
>>
>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>
>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2,
>>
>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0,
>>
>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196
>>
>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2,
>>
>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>
>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, 
>> argv=0x7fff9d835e70)
>>
>> at init.c:146
>>
>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27
>>
>> Stack trace for node 1:
>>
>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffdee81020,
>>
>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143
>>
>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1,
>>
>> state=<value optimized out>) at ch3_progress.c:202
>>
>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0)
>>
>> at helper_fns.c:269
>>
>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2,
>>
>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4,
>>
>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688,
>>
>> status=0x7fffdee811a0) at helper_fns.c:125
>>
>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=<value optimized out>,
>>
>> sendcount=<value optimized out>, sendtype=1275069445, recvbuf=0xf79020,
>>
>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80)
>>
>> at allgather.c:192
>>
>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>
>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2,
>>
>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>
>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, key=0,
>>
>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196
>>
>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2,
>>
>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>
>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, 
>> argv=0x7fffdee814b0)
>>
>> at init.c:146
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27
>>
>> --
>>
>> Michael Heinz
>>
>> Principal Engineer, Qlogic Corporation
>>
>> King of Prussia, Pennsylvania
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu 
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>>
>> -- 
>> In the middle of difficulty, lies opportunity
>>
>>     


More information about the mpich2-dev mailing list