[mpich-discuss] MPICH2 + Boost.MPI Collective Problems

Dave Goodell goodell at mcs.anl.gov
Fri Aug 20 11:37:36 CDT 2010


Which exact version of MPICH2 are you using?  The binary package for Ubuntu?

If plain C/C++ MPI programs are executing OK on your MPICH2 installation, then it sounds like a problem with your Boost.MPI.  You should contact the Boost.MPI folks for support.

FWIW, the sort of error that you are seeing is usually a symptom of some other problem.  For example, another process in the job might be segfaulting, but the communication error in a surviving process propagates up to the process manager faster than the segfault error on the failing process.  Turning on core file creation ("ulimit -c unlimited", plus Google around if that doesn't work) sometimes helps for debugging this sort of thing.

-Dave

On Aug 20, 2010, at 9:53 AM CDT, Stephan Hackstedt wrote:

> Hi there,
> 
> i have a big problem by running MPICH2 programs which use the Boost.MPI library. When i'm trying to run programs on more than one node, collective operations like communicator::barrier, broadcast, or even the environment destructor (cause of FINALIZE, which is colletive) causing the programm to crash. Maybe its a problem of Boost and the communication cystem, i use ch3:nemesis.
> My errors are like this :
> 
> [1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >'
> 
> [1]  what():  MPI_Barrier: Other MPI error, error stack:
> [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed
> [1]MPIR_Barrier_impl(255)............: 
> [1]MPIR_Barrier_intra(79)............: 
> [1]MPIC_Sendrecv(186)................: 
> [1]MPIC_Wait(534)....................: 
> [1]MPIDI_CH3I_Progress(184)..........: 
> [1]MPID_nem_mpich2_blocking_recv(895): 
> [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0: 
> 
> I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors..
> But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier, the programm runs well. I am using MPICH2 on Ubuntu 10.04 x86 platforms.
> Someone had problems like this, or know a fix for that?
> 
> Regards,
> 
> Stephan _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list