[mpich-discuss] Collective comm not working

Nicolas Rosner nrosner at gmail.com
Wed Dec 7 06:10:55 CST 2011


Hi Bibrak,

> The application starts on all the machines but cannot
> successfully complete the Barrier().
> What could be the problem, any guess?

Do nontrivial programs that only do p2p comm work fine, without such
problems? Subject line & example seem to suggest so, but please
confirm.

Otherwise,

http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_My_MPI_program_aborts_with_an_error_saying_it_cannot_communicate_with_other_processes

tried that already?

Hth, N.



On Wed, Dec 7, 2011 at 5:12 AM, Bibrak Qamar <bibrakc at gmail.com> wrote:
> Hello all,
>
>
> I am using mpich2-1.4.1p1 on a cluster of machines. I use mpiexec to run
> application. The application starts on all the machines but cannot
> successfully complete the Barrier().
>
> What could be the problem, any guess?
>
> Hello world from process 0 of 3 | Hostname = ccitsuseamd1
> Hello world from process 2 of 3 | Hostname = ccitsuse05
> Hello world from process 1 of 3 | Hostname = ccitsuse07
>
>
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425)...............: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier_impl(331)..........: Failure during collective
> MPIR_Barrier_impl(313)..........:
> MPIR_Barrier_intra(83)..........:
> MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 0
> Hello world After Barrier from process 1 of 3 | Hostname = ccitsuse07
>
>
>
> Thanks
> Bibrak
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>


More information about the mpich-discuss mailing list