[mpich-discuss] Error: assert (!closed) failed
Yann RADENAC
Yann.Radenac at inria.fr
Mon Aug 20 03:46:36 CDT 2012
Hi,
I'm developing MPI support for XtreemOS (www.xtreemos.eu) so that an MPI
program is managed as a single XtreemOS job.
To manage all processes as a single XtreemOS job, I've developed the
program xos-createProcess that plays the role of the launcher (replacing
ssh/rsh) to start a process on a remote machine that is part of the ones
reserved for the current job.
I'm running a simple hello world MPI program where each processes sends
a string to the process 0 that itself prints them on standard output.
When using MPICH2 with ssh, this program works perfectly on several
machines.
When using MPICH2 with my launcher xos-createProcess, it works with an
MPI program of 2 processes on 2 different machines.
However I cannot pass through the following error that happens when
running an MPI program of 3 processes on 3 different machines (or any n
processes on n different machines with n >= 3). Everything terminates
almost immediately with these error messages:
Process 0 ends with error code 7 and its standard error output is :
[mpiexec at paradent-2.rennes.grid5000.fr] cmd_response
(./pm/pmiserv/pmiserv_pmi_v1.c:29): assert (!closed) failed
[mpiexec at paradent-2.rennes.grid5000.fr] fn_barrier_in
(./pm/pmiserv/pmiserv_pmi_v1.c:70): error writing PMI line
[mpiexec at paradent-2.rennes.grid5000.fr] handle_pmi_cmd
(./pm/pmiserv/pmiserv_cb.c:44): PMI handler returned error
[mpiexec at paradent-2.rennes.grid5000.fr] control_cb
(./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command
[mpiexec at paradent-2.rennes.grid5000.fr] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at paradent-2.rennes.grid5000.fr] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at paradent-2.rennes.grid5000.fr] main (./ui/mpich/mpiexec.c:405):
process manager error waiting for completion
On *only* one of the other processes, the standard error output is:
[proxy:0:0 at paradent-1.rennes.grid5000.fr] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
[proxy:0:0 at paradent-1.rennes.grid5000.fr] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at paradent-1.rennes.grid5000.fr] main
(./pm/pmiserv/pmip.c:226): demux engine error waiting for event
The run command is:
-bash -c '(mpiexec -launcher-exec /usr/bin/xos-createProcess -np 3
-host `xreservation -a $XOS_RSVID` ./mpi/hello_world_MPI < /dev/null >
mpiexec.out) >& mpiexec.err'
Can anyone explain me what this error means ?
I'm using MPICH2 1.4.1p1
Thanks for your help.
--
Yann Radenac
Research Engineer, INRIA
Myriads research team, INRIA Rennes - Bretagne Atlantique
More information about the mpich-discuss
mailing list