[mpich-discuss] [MPICH2]Fatal error in PMPI_Bcast: A process has failed, error stack

Pavan Balaji balaji at mcs.anl.gov
Thu Nov 8 09:22:10 CST 2012


Looks like a problem with the system or networking setup (e.g., firewall 
or problem with /etc/hosts).  Did you look at the FAQ?

http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions

  -- Pavan

On 11/08/12 08:59, ugiwgh wrote:
> I have setting two machines with mpich2-1.5. One is named "console", the other is "node".
> My command is "/usr/local/mpich2-1.5/bin/mpirun -f /etc/hydra/hosts  /usr/local/mpich2-1.5/share/examples/logging/cpilog"
> When I run it on "node", it runs ok. But it failed on "console".
>
> The following is error message
> --------------
>   /usr/local/mpich2-1.5/bin/mpirun -f /etc/hydra/hosts  /usr/local/mpich2-1.5/share/examples/logging/cpilog
> Fatal error in PMPI_Bcast: A process has failed, error stack:
> PMPI_Bcast(1525)...............: MPI_Bcast(buf=0x170e188, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
> MPIR_Bcast_impl(1369)..........:
> MPIR_Bcast_intra(1199).........:
> MPIR_Bcast_binomial(195).......:
> MPIC_Send(63)..................:
> MPIDI_EagerContigShortSend(261): failure occurred while attempting to send an eager message
> MPIDI_CH3_iStartMsg(36)........: Communication error with rank 1
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> [proxy:0:1 at node] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
> [proxy:0:1 at node] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at node] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
> [mpiexec at console.paratera.com] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
> [mpiexec at console.paratera.com] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at console.paratera.com] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion
> [mpiexec at console.paratera.com] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion
>
>
> Any help will be appreciated.
> GHui
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list