[mpich-discuss] fail to run hello world program with MPICH2-1.3a2 on multiple nodes

Pavan Balaji balaji at mcs.anl.gov
Fri Jul 9 08:41:16 CDT 2010


This looks like a connection problem between the two nodes. Is there a 
firewall on either of the nodes? If yes, can you disable it?

  -- Pavan

On 07/09/2010 06:26 AM, Manhui Wang wrote:
> Hello,
> 
> I have a problem about running MPI jobs on multinodes with newly
> released MPICH2-1.3a2, which hydra is the default process manager.
> 
> I just tested the simplest hello world program. It works fine on any
> single node, but fails on multinodes.
> 
> node6-b:~/testprogram> cat hosts
> node6-b
> node6-b
> node7-b
> node7-b
> 
> node6-b:~/testprogram> mpiexec -f hosts -n 4 ./hello
> node6-b: hello world,length=7,my rank=0
> node6-b: hello world,length=7,my rank=1
> node7-b: hello world,length=7,my rank=3
> node7-b: hello world,length=7,my rank=2
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(476).................: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier(82)..................:
> MPIC_Sendrecv(161)................:
> MPIC_Wait(519)....................:
> MPIDI_CH3I_Progress(165)..........:
> MPID_nem_mpich2_blocking_recv(880):
> MPID_nem_tcp_connpoll(1714).......: Communication error
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(476).................: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier(82)..................:
> MPIC_Sendrecv(161)................:
> MPIC_Wait(519)....................:
> MPIDI_CH3I_Progress(165)..........:
> MPID_nem_mpich2_blocking_recv(895):
> MPID_nem_tcp_connpoll(1714).......: Communication error
> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
> 
> 
> I built the MPICH2-1.3a2 library with Intel 11.1/069 compilers on 64-bit
> AMD machine:
> 
> nice -n +18 ./configure  --with-device=ch3:nemesis
> --prefix=/mympich2-install FC=ifort --enable-f90 F90=ifort --enable-f77
> F77=ifort --enable-cc CC=icc --enable-cxx  CXX=icc 2>&1 | tee configure.log
> 
> nice -n +18 make 2>&1 | tee make.log
> 
> nice -n +18 make install 2>&1 | tee install.log
> 
> 
> Could you please point out what is the problem? I have attached the
> source code.
> 
> Thanks
> Manhui
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list