[mpich-discuss] Program Crash

Rajeev Thakur thakur at mcs.anl.gov
Fri Jul 1 13:23:53 CDT 2011


There could be some thread safety related issue in your code. If your code is simple enough or you can reproduce it with a simple test program, you can post the test here.

Rajeev

On Jul 1, 2011, at 12:31 AM, jarray52 jarray52 wrote:

> Hi,
> 
> My code crashes, and I'm not sure how to debug the problem. I'm new to MPI/mpich programming, and any suggestions on debugging the problem would be appreciated. Here is the error output displayed by mpich:
> 
> [proxy:0:1 at ComputeNodeIB101] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:1 at ComputeNodeIB101] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at ComputeNodeIB101] main (./pm/pmiserv/pmip.c:222): demux engine error waiting for event
> [proxy:0:2 at ComputeNodeIB102] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:2 at ComputeNodeIB102] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:2 at ComputeNodeIB102] main (./pm/pmiserv/pmip.c:222): demux engine error waiting for event
> [proxy:0:3 at ComputeNodeIB103] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:3 at ComputeNodeIB103] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:3 at ComputeNodeIB103] main (./pm/pmiserv/pmip.c:222): demux engine error waiting for event
> 
> I'm using MPI_THREAD_MULTIPLE over an ib fabric. The problem doesn't occur all the time. I believe it occurs during a recv statement, but I'm not certain.
> 
> Thanks,
> Jay
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list