[mpich-discuss] error running mpi program

Pavan Balaji balaji at mcs.anl.gov
Fri Oct 28 08:18:17 CDT 2011


Please keep mpich-discuss cc'ed.

Apart from the fact that something is wrong with pexe-note, it's hard to 
guess anything else from the below error message.

Does a non-MPI program such as /bin/hostname work correctly?

% mpiexec -f machinefile -n 10 /bin/hostname

  -- Pavan

On 10/27/2011 11:03 PM, Charles Sartori wrote:
> Rajeev and Pavan, both executable hellow are in the same location with
> permissions.
> I added another machine, now i have in machinefile:
>
>     pexe-pc
>     loiva-note
>     pexe-note
>
>
> when i run cpi/hellow example with pexe-pc and loiva-note all works
> fine, but, when i try run cpi exemple with all 3 nodes i got this:
>
>     pexe-note at pexe-pc:~$ mpiexec -f machinefile -n 10
>     ./mpich2-1.4/examples/cpi
>     Process 1 of 10 is on pexe-pc
>     Process 4 of 10 is on pexe-pc
>     Process 7 of 10 is on pexe-pc
>     Process 5 of 10 is on pexe-note
>     Process 8 of 10 is on pexe-note
>     Process 3 of 10 is on loiva-note
>     Process 6 of 10 is on loiva-note
>     Process 2 of 10 is on pexe-note
>     Process 0 of 10 is on loiva-note
>     Process 9 of 10 is on loiva-note
>
>     =====================================================================================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   EXIT CODE: 11
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     =====================================================================================
>     [proxy:0:1 at pexe-pc] HYD_pmcd_pmip_control_cmd_cb
>     (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>     [proxy:0:1 at pexe-pc] HYDT_dmxu_poll_wait_for_event
>     (./tools/demux/demux_poll.c:77): callback returned error status
>     [proxy:0:1 at pexe-pc] main (./pm/pmiserv/pmip.c:226): demux engine
>     error waiting for event
>     [proxy:0:0 at loiva-note] HYD_pmcd_pmip_control_cmd_cb
>     (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>     [proxy:0:0 at loiva-note] HYDT_dmxu_poll_wait_for_event
>     (./tools/demux/demux_poll.c:77): callback returned error status
>     [proxy:0:0 at loiva-note] main (./pm/pmiserv/pmip.c:226): demux engine
>     error waiting for event
>     [mpiexec at pexe-pc] HYDT_bscu_wait_for_completion
>     (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes
>     terminated badly; aborting
>     [mpiexec at pexe-pc] HYDT_bsci_wait_for_completion
>     (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>     waiting for completion
>     [mpiexec at pexe-pc] HYD_pmci_wait_for_completion
>     (./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting
>     for completion
>     [mpiexec at pexe-pc] main (./ui/mpich/mpiexec.c:397): process manager
>     error waiting for completion
>     pexe-note at pexe-pc:~$
>
>
> --
> *Charles Sartori

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list