[mpich-discuss] mpich2 hangs on Ubuntu beowulf cluster(with NFS)

Konstantinos Varotsos kvarotso at gmail.com
Thu Jan 5 11:35:54 CST 2012



Hi there,

  new efforts today without getting

into the code (we only changes the switch)

i get this error




Internal Error: invalid error code 209e0e (Ring ids do not match) in 
MPIR_Bcast_intra:1119
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1478)......: MPI_Bcast(buf=0xa9533e8, count=1, MPI_CHAR, 
root=0, comm=0x84000004) failed
MPIR_Bcast_impl(1321).:
MPIR_Bcast_intra(1119):
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........: MPI_Barrier(comm=0x84000004) failed
MPIR_Barrier_impl(306)......:
MPIR_Bcast_impl(1321).......:
MPIR_Bcast_intra(1155)......:
MPIR_Bcast_binomial(213)....: Failure during collective
MPIR_Barrier_impl(292)......:
MPIR_Barrier_or_coll_fn(121):
MPIR_Barrier_intra(83)......:
dequeue_and_set_error(596)..: Communication error with rank 4


Other times it ends up with Communication error with rank 1

Does this look familiar to anyone?


Thanx Kwstas




More information about the mpich-discuss mailing list