[mpich-discuss] mpich2 hangs on Ubuntu beowulf cluster(with NFS) update

Darius Buntinas buntinas at mcs.anl.gov
Fri Jan 6 10:32:58 CST 2012


Hi Kwstas,

If you're using the latest stable version (not one of the 1.5 alpha releases), the "ring id mismatch" error is due to a bug related to error reporting, so that _shouldn't_ be the cause of the problem.

What version of MPICH2 are you using?

Try running it again with the -l flag to mpiexec.  That will print the rank of the process next to any output and will let us know which process is reporting the error.

-d

On Jan 6, 2012, at 5:53 AM, Konstantinos Varotsos wrote:

> 
> Hi there,
> 
> 
> I talked to the person that used the code before and
> 
> she told me that the code worked on a grid here in Greece with no problems
> 
> So I searched for other reasons. Because the code is pre-compiled
> 
> in another machine (ia32, due to fortran licence issues ) the libaries used were mpich-1.2p1
> 
> so I installed there the latest stable version.
> 
> Now when I run the exe I receive an error
> 
> Internal Error: invalid error code 209e0e (Ring ids do not match) in MPIR_Bcast_intra:1119
> Fatal error in PMPI_Bcast: Other MPI error, error stack:
> PMPI_Bcast(1478)......: MPI_Bcast(buf=0xa0e2cf8, count=1, MPI_CHAR, root=0, comm=0x84000004) failed
> MPIR_Bcast_impl(1321).:
> MPIR_Bcast_intra(1119):
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425)...........: MPI_Barrier(comm=0x84000004) failed
> MPIR_Barrier_impl(306)......:
> MPIR_Bcast_impl(1321).......:
> MPIR_Bcast_intra(1155)......:
> MPIR_Bcast_binomial(213)....: Failure during collective
> MPIR_Barrier_impl(292)......:
> MPIR_Barrier_or_coll_fn(121):
> MPIR_Barrier_intra(83)......:
> dequeue_and_set_error(596)..: Communication error with rank 8
> 
> 
> I did a make testing and all test passed except to this
> 
> 
> <NAME>bcastlength</NAME>^M
> <NP>4</NP>^M
> <WORKDIR>./errors/coll</WORKDIR>^M
> <STATUS>fail</STATUS>^M
> <TESTDIFF>^M
> Did not detect mismatched length (long) on process 3
> Did not detect mismatched length (short) on process 3
> Found 2 errors
> 
> I dont know how to translate these two errors.
> 
> 
> I dont know if this is relevant but some suggest to deactivate hypertherading
> 
> 
> Do you have any suggestions?
> 
> Thanx
> 
> Kwstas
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list