[mpich-discuss] mpich-discuss Digest, Vol 34, Issue 28

c cook csecook at gmail.com
Tue Jul 26 15:00:16 CDT 2011


Hi,

I had some time ago problems running a parallel application using the mpich2
with the mpd daemon. One of the users from the mpich-list suggested I should
install the new version of mpich with hydra process manager.

Now I can run the application but at some poitn it stops with this error:

InitMesh: Mesh cutoff (required, used) =   400.000   418.568 Ry

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:1 at cn102.cluster.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:1 at cn102.cluster.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at cn102.cluster.local] main (./pm/pmiserv/pmip.c:226): demux engine
error waiting for event
[proxy:0:3 at cn104.cluster.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:3 at cn104.cluster.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:3 at cn104.cluster.local] main (./pm/pmiserv/pmip.c:226): demux engine
error waiting for event
[mpiexec at headnode.cluster.local] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
[mpiexec at headnode.cluster.local] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at headnode.cluster.local] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting for
completion
[mpiexec at headnode.cluster.local] main (./ui/mpich/mpiexec.c:397): process
manager error waiting for completion

I am using a cluster with 8 nodes (cn101 to cn108) having 2 procs each

The example with the cpi works fine.

AAny idea what could be the problem?

Thank you, Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110726/ec46cbea/attachment.htm>


More information about the mpich-discuss mailing list