[mpich-discuss] mpich2 hangs on Ubuntu beowulf cluster(with NFS) update

Konstantinos Varotsos kvarotso at gmail.com
Fri Jan 6 05:53:47 CST 2012


Hi there,


I talked to the person that used the code before and

she told me that the code worked on a grid here in Greece with no problems

  So I searched for other reasons. Because the code is pre-compiled

in another machine (ia32, due to fortran licence issues ) the libaries 
used were mpich-1.2p1

so I installed there the latest stable version.

Now when I run the exe I receive an error

  Internal Error: invalid error code 209e0e (Ring ids do not match) in 
MPIR_Bcast_intra:1119
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1478)......: MPI_Bcast(buf=0xa0e2cf8, count=1, MPI_CHAR, 
root=0, comm=0x84000004) failed
MPIR_Bcast_impl(1321).:
MPIR_Bcast_intra(1119):
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........: MPI_Barrier(comm=0x84000004) failed
MPIR_Barrier_impl(306)......:
MPIR_Bcast_impl(1321).......:
MPIR_Bcast_intra(1155)......:
MPIR_Bcast_binomial(213)....: Failure during collective
MPIR_Barrier_impl(292)......:
MPIR_Barrier_or_coll_fn(121):
MPIR_Barrier_intra(83)......:
dequeue_and_set_error(596)..: Communication error with rank 8


I did a make testing and all test passed except to this


<NAME>bcastlength</NAME>^M
<NP>4</NP>^M
<WORKDIR>./errors/coll</WORKDIR>^M
<STATUS>fail</STATUS>^M
<TESTDIFF>^M
Did not detect mismatched length (long) on process 3
Did not detect mismatched length (short) on process 3
  Found 2 errors

I dont know how to translate these two errors.


I dont know if this is relevant but some suggest to deactivate 
hypertherading


Do you have any suggestions?

Thanx

Kwstas




More information about the mpich-discuss mailing list