[mpich-discuss] Error: assert (!closed) failed

Yann RADENAC Yann.Radenac at inria.fr
Mon Aug 20 03:46:36 CDT 2012


I'm developing MPI support for XtreemOS (www.xtreemos.eu) so that an MPI 
program is managed as a single XtreemOS job.
To manage all processes as a single XtreemOS job, I've developed the 
program xos-createProcess that plays the role of the launcher (replacing 
ssh/rsh) to start a process on a remote machine that is part of the ones 
reserved for the current job.

I'm running a simple hello world MPI program where each processes sends 
a string to the process 0 that itself prints them on standard output.

When using MPICH2 with ssh, this program works perfectly on several 

When using MPICH2 with my launcher xos-createProcess, it works with an 
MPI program of 2 processes on 2 different machines.

However I cannot pass through the following error that happens when 
running an MPI program of 3 processes on 3 different machines (or any n 
processes on n different machines with n >= 3). Everything terminates 
almost immediately with these error messages:

Process 0 ends with error code 7 and its standard error output is :

[mpiexec at paradent-2.rennes.grid5000.fr] cmd_response 
(./pm/pmiserv/pmiserv_pmi_v1.c:29): assert (!closed) failed
[mpiexec at paradent-2.rennes.grid5000.fr] fn_barrier_in 
(./pm/pmiserv/pmiserv_pmi_v1.c:70): error writing PMI line
[mpiexec at paradent-2.rennes.grid5000.fr] handle_pmi_cmd 
(./pm/pmiserv/pmiserv_cb.c:44): PMI handler returned error
[mpiexec at paradent-2.rennes.grid5000.fr] control_cb 
(./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command
[mpiexec at paradent-2.rennes.grid5000.fr] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at paradent-2.rennes.grid5000.fr] HYD_pmci_wait_for_completion 
(./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at paradent-2.rennes.grid5000.fr] main (./ui/mpich/mpiexec.c:405): 
process manager error waiting for completion

On *only* one of the other processes, the standard error output is:

[proxy:0:0 at paradent-1.rennes.grid5000.fr] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
[proxy:0:0 at paradent-1.rennes.grid5000.fr] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at paradent-1.rennes.grid5000.fr] main 
(./pm/pmiserv/pmip.c:226): demux engine error waiting for event

The run command is:

-bash -c '(mpiexec  -launcher-exec /usr/bin/xos-createProcess   -np 3 
-host `xreservation -a $XOS_RSVID`  ./mpi/hello_world_MPI  < /dev/null > 
mpiexec.out) >& mpiexec.err'

Can anyone explain me what this error means ?

I'm using MPICH2 1.4.1p1

Thanks for your help.

Yann Radenac
Research Engineer, INRIA
Myriads research team, INRIA Rennes - Bretagne Atlantique

