[mpich-discuss] MPICH2 + Boost.MPI Collective Problems

Stephan Hackstedt stephan.hackstedt at googlemail.com
Fri Aug 20 09:53:41 CDT 2010


Hi there,

i have a big problem by running MPICH2 programs which use the Boost.MPI
library. When i'm trying to run programs on *more *than one node, collective
operations like
communicator::barrier<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/communicator.html#id918378-bb>,
broadcast,<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/broadcast.html>or
even the
environment<http://boost.org/doc/libs/1_44_0/doc/html/boost/mpi/environment.html>destructor
(cause of FINALIZE, which is colletive) causing the programm to
crash. Maybe its a problem of Boost and the communication cystem, i use
ch3:nemesis.
My errors are like this :

*[1]terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
>'

[1]  what():  MPI_Barrier: Other MPI error, error stack:
[1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed
[1]MPIR_Barrier_impl(255)............:
[1]MPIR_Barrier_intra(79)............:
[1]MPIC_Sendrecv(186)................:
[1]MPIC_Wait(534)....................:
[1]MPIDI_CH3I_Progress(184)..........:
[1]MPID_nem_mpich2_blocking_recv(895):
[1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0: *

I also tested this with the simple broadcast example from the Boost.MPI
tutorial - same errors..
But when using the original MPI equivalent without the Boost.MPI library,
such as MPI_Barrier<http://www.mpi-forum.org/docs/mpi-11-html/node66.html#Node66>,
the programm runs well. I am using MPICH2 on Ubuntu 10.04 x86 platforms.
Someone had problems like this, or know a fix for that?

Regards,

Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100820/95d633ad/attachment.htm>


More information about the mpich-discuss mailing list