[mpich-discuss] MPICH2 + Boost.MPI Collective Problems
Dave Goodell
goodell at mcs.anl.gov
Fri Aug 20 11:37:36 CDT 2010
Which exact version of MPICH2 are you using? The binary package for Ubuntu?
If plain C/C++ MPI programs are executing OK on your MPICH2 installation, then it sounds like a problem with your Boost.MPI. You should contact the Boost.MPI folks for support.
FWIW, the sort of error that you are seeing is usually a symptom of some other problem. For example, another process in the job might be segfaulting, but the communication error in a surviving process propagates up to the process manager faster than the segfault error on the failing process. Turning on core file creation ("ulimit -c unlimited", plus Google around if that doesn't work) sometimes helps for debugging this sort of thing.
-Dave
On Aug 20, 2010, at 9:53 AM CDT, Stephan Hackstedt wrote:
> Hi there,
>
> i have a big problem by running MPICH2 programs which use the Boost.MPI library. When i'm trying to run programs on more than one node, collective operations like communicator::barrier, broadcast, or even the environment destructor (cause of FINALIZE, which is colletive) causing the programm to crash. Maybe its a problem of Boost and the communication cystem, i use ch3:nemesis.
> My errors are like this :
>
> [1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >'
>
> [1] what(): MPI_Barrier: Other MPI error, error stack:
> [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed
> [1]MPIR_Barrier_impl(255)............:
> [1]MPIR_Barrier_intra(79)............:
> [1]MPIC_Sendrecv(186)................:
> [1]MPIC_Wait(534)....................:
> [1]MPIDI_CH3I_Progress(184)..........:
> [1]MPID_nem_mpich2_blocking_recv(895):
> [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:
>
> I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors..
> But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier, the programm runs well. I am using MPICH2 on Ubuntu 10.04 x86 platforms.
> Someone had problems like this, or know a fix for that?
>
> Regards,
>
> Stephan _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list