[mpich-discuss] mpich2 hangs on Ubuntu beowulf cluster(with NFS) update

Darius Buntinas buntinas at mcs.anl.gov
Mon Jan 9 15:28:38 CST 2012


Here's a patch that should fix the internal error so mpich2 can report a valid error message.  If you built mpich2 from source, apply this patch, and rebuild and reinstall it, then relink your application with the new library and try it again.

Let me know how this works.
-d

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bcast.diff
Type: application/octet-stream
Size: 5726 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120109/eb1ef112/attachment.obj>
-------------- next part --------------

On Jan 6, 2012, at 6:26 PM, Konstantinos Varotsos wrote:

> 
> Hi
> 
> the version I am using is mpich2-1.4.1p1 and the output with -l flag
> 
> 
> [8] Internal Error: invalid error code 209e0e (Ring ids do not match) in MPIR_Bcast_intra:1119
> [8] Fatal error in PMPI_Bcast: Other MPI error, error stack:
> [8] PMPI_Bcast(1478)......: MPI_Bcast(buf=0xa913cf8, count=1, MPI_CHAR, root=0, comm=0x84000004) failed
> [8] MPIR_Bcast_impl(1321).:
> [8] MPIR_Bcast_intra(1119):
> [0] Fatal error in PMPI_Barrier: Other MPI error, error stack:
> [0] PMPI_Barrier(425)...........: MPI_Barrier(comm=0x84000004) failed
> [0] MPIR_Barrier_impl(306)......:
> [0] MPIR_Bcast_impl(1321).......:
> [0] MPIR_Bcast_intra(1155)......:
> [0] MPIR_Bcast_binomial(213)....: Failure during collective
> [0] MPIR_Barrier_impl(292)......:
> [0] MPIR_Barrier_or_coll_fn(121):
> [0] MPIR_Barrier_intra(83)......:
> [0] dequeue_and_set_error(596)..: Communication error with rank 8
> 
> 
> thanx Kwstas
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list