[mpich-discuss] Program Crash
Gregory Magoon
gmagoon at MIT.EDU
Thu Jul 21 11:57:49 CDT 2011
I have been encountering a similar issue when I run the MPICH2 v.1.4 tests on an
NFS file system (the tests also cause an abnormally high amount of NFS traffic).
I don't have the same issues when running on a local filesystem. I'm wondering
if this might be related to ticket #1422 and/or ticket #1483:
http://trac.mcs.anl.gov/projects/mpich2/ticket/1422
http://trac.mcs.anl.gov/projects/mpich2/ticket/1483
I'm new to mpich, so if anyone has any tips, it would be very much appreciated.
Here is the initial output of the failed tests:
user at node01:~/Molpro/src/mpich2-1.4$ make testing
(cd test && make testing)
make[1]: Entering directory `/home/user/Molpro/src/mpich2-1.4/test'
(NOXMLCLOSE=YES && export NOXMLCLOSE && cd mpi && make testing)
make[2]: Entering directory `/home/user/Molpro/src/mpich2-1.4/test/mpi'
./runtests -srcdir=. -tests=testlist \
-mpiexec=/home/user/Molpro/src/mpich2-install/bin/mpiexec \
-xmlfile=summary.xml
Looking in ./testlist
Processing directory attr
Looking in ./attr/testlist
Processing directory coll
Looking in ./coll/testlist
Unexpected output in allred: [mpiexec at node01] APPLICATION TIMED OUT
Unexpected output in allred: [proxy:0:0 at node01] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
Unexpected output in allred: [proxy:0:0 at node01] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
Unexpected output in allred: [proxy:0:0 at node01] main (./pm/pmiserv/pmip.c:226):
demux engine error waiting for event
Unexpected output in allred: [mpiexec at node01] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
Unexpected output in allred: [mpiexec at node01] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
Unexpected output in allred: [mpiexec at node01] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting for
completion
Unexpected output in allred: [mpiexec at node01] main (./ui/mpich/mpiexec.c:397):
process manager error waiting for completion
Program allred exited without No Errors
Thanks,
Greg
>There could be some thread safety related issue in your code. If your code is
>simple enough or you can reproduce it with a simple test program, you can post
>the test here.
>
>Rajeev
>
>On Jul 1, 2011, at 12:31 AM, jarray52 jarray52 wrote:
>
> Hi,
>
> My code crashes, and I'm not sure how to debug the problem. I'm new to
MPI/mpich programming, and any suggestions on debugging the problem would be
appreciated. Here is the error output displayed by mpich:
>
> [proxy:0:1 at ComputeNodeIB101] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:1 at ComputeNodeIB101] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at ComputeNodeIB101] main (./pm/pmiserv/pmip.c:222): demux engine
error waiting for event
> [proxy:0:2 at ComputeNodeIB102] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:2 at ComputeNodeIB102] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:2 at ComputeNodeIB102] main (./pm/pmiserv/pmip.c:222): demux engine
error waiting for event
> [proxy:0:3 at ComputeNodeIB103] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:3 at ComputeNodeIB103] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:3 at ComputeNodeIB103] main (./pm/pmiserv/pmip.c:222): demux engine
error waiting for event
>
> I'm using MPI_THREAD_MULTIPLE over an ib fabric. The problem doesn't occur all
the time. I believe it occurs during a recv statement, but I'm not certain.
>
> Thanks,
> Jay
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list