Hi,<br><br>I used MPICH2-1.2.1p1 and MPICH2-1.3b1, both builded from source.<br><br>Stephan<br><br><br><div class="gmail_quote">2010/8/20 Dave Goodell <span dir="ltr"><<a href="mailto:goodell@mcs.anl.gov">goodell@mcs.anl.gov</a>></span><br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Which exact version of MPICH2 are you using? The binary package for Ubuntu?<br>
<br>
If plain C/C++ MPI programs are executing OK on your MPICH2 installation, then it sounds like a problem with your Boost.MPI. You should contact the Boost.MPI folks for support.<br>
<br>
FWIW, the sort of error that you are seeing is usually a symptom of some other problem. For example, another process in the job might be segfaulting, but the communication error in a surviving process propagates up to the process manager faster than the segfault error on the failing process. Turning on core file creation ("ulimit -c unlimited", plus Google around if that doesn't work) sometimes helps for debugging this sort of thing.<br>
<br>
-Dave<br>
<div><div></div><div class="h5"><br>
On Aug 20, 2010, at 9:53 AM CDT, Stephan Hackstedt wrote:<br>
<br>
> Hi there,<br>
><br>
> i have a big problem by running MPICH2 programs which use the Boost.MPI library. When i'm trying to run programs on more than one node, collective operations like communicator::barrier, broadcast, or even the environment destructor (cause of FINALIZE, which is colletive) causing the programm to crash. Maybe its a problem of Boost and the communication cystem, i use ch3:nemesis.<br>
> My errors are like this :<br>
><br>
> [1]terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >'<br>
><br>
> [1] what(): MPI_Barrier: Other MPI error, error stack:<br>
> [1]PMPI_Barrier(362).................: MPI_Barrier(MPI_COMM_WORLD) failed<br>
> [1]MPIR_Barrier_impl(255)............:<br>
> [1]MPIR_Barrier_intra(79)............:<br>
> [1]MPIC_Sendrecv(186)................:<br>
> [1]MPIC_Wait(534)....................:<br>
> [1]MPIDI_CH3I_Progress(184)..........:<br>
> [1]MPID_nem_mpich2_blocking_recv(895):<br>
> [1]MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:<br>
><br>
> I also tested this with the simple broadcast example from the Boost.MPI tutorial - same errors..<br>
> But when using the original MPI equivalent without the Boost.MPI library, such as MPI_Barrier, the programm runs well. I am using MPICH2 on Ubuntu 10.04 x86 platforms.<br>
> Someone had problems like this, or know a fix for that?<br>
><br>
> Regards,<br>
><br>
</div></div>> Stephan _______________________________________________<br>
<div><div></div><div class="h5">> mpich-discuss mailing list<br>
> <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</div></div></blockquote></div><br>