[mpich-discuss] Fatal error in PMPI_Bcast: Other MPI error, error stack:

游手好闲 66152764 at qq.com
Tue Jul 26 01:55:46 CDT 2011


Hi ,
 My hosts:
 hksbs-s13.com:8
hksbs-s11.com:8
  
 When i run in one node,it is ok.
 [root at hksbs-s13 examples_collchk]# mpiexec -f hosts -n 8 ./time_bcast_nochk
time taken by 1X1 MPI_Bcast() at rank 0 = 0.000005
time taken by 1X1 MPI_Bcast() at rank 1 = 0.000002
time taken by 1X1 MPI_Bcast() at rank 2 = 0.000003
time taken by 1X1 MPI_Bcast() at rank 3 = 0.000002
time taken by 1X1 MPI_Bcast() at rank 4 = 0.000004
time taken by 1X1 MPI_Bcast() at rank 5 = 0.000002
time taken by 1X1 MPI_Bcast() at rank 6 = 0.000003
time taken by 1X1 MPI_Bcast() at rank 7 = 0.000002
  
 but when i connect to other node, it failed
  
  
 [root at hksbs-s13 examples_logging]# mpiexec -f hosts -n 9 ./srtest
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1478)......................: MPI_Bcast(buf=0x16fc2aa8, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1321).................:
MPIR_Bcast_intra(1119)................:
MPIR_Bcast_scatter_ring_allgather(961):
MPIR_Bcast_binomial(213)..............: Failure during collective
MPIR_Bcast_scatter_ring_allgather(952):
MPIR_Bcast_binomial(189)..............:
MPIC_Send(63).........................:
MPIDI_EagerContigShortSend(262).......: failure occurred while attempting to send an eager message
MPIDI_CH3_iStartMsg(36)...............: Communication error with rank 8
  
  
 when i ssh the other node, for example
  
 
[root at hksbs-s13 examples_logging]# ssh hksbs-s11.com
Last login: Tue Jul 26 15:45:22 2011 from 10.33.15.233
[root at hksbs-s11 ~]#

  
 it works.
  
 How can check the reason?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110726/f053e3bd/attachment.htm>


More information about the mpich-discuss mailing list