[mpich-discuss] FW: MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused

Miguel Angel Fernández mafga74 at hotmail.com
Fri Nov 4 03:57:23 CDT 2011


Hi
Pavan

 

I thought I answered your email,...
anyway, I'm doing to much things at the same time ;-)

 

Yes, all machines can communicate to each
others.

 

Attached, you have the output of the
commands "configure", "make" and "make install"
for users "mpi" and "root".

 

I sent to your personal email "balaji at mcs.anl.gov" one document with the configuration I am using. Maybe you can find
the thing I am doing wrong.

 

Thanks for your time

Miguel Angel

 

 

> Date: Thu, 3 Nov 2011 18:40:24 -0500
> From: balaji at mcs.anl.gov
> To: mpich-discuss at mcs.anl.gov
> CC: mafga74 at hotmail.com
> Subject: Re: [mpich-discuss] MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 
> Hi,
> 
> You never responded to the previous email I sent:
> 
> Is every machine able to connect to every other machine (not just mpi0
> to every other machine)?
> 
> Btw, Hydra doesn't need any additional password like mpd, but you need 
> to make sure that you can ssh to every machine and that every machine 
> can connect to every other machine.
> 
>   -- Pavan
> 
> On 11/03/2011 02:53 AM, Miguel Angel Fernández wrote:
> > Hello again
> >
> > I'm still trying to fix the problem I told you days ago.
> > I configured all cluster machines for executing mpiexec as root on any
> > cases and the problem is the same.
> > Is it necesary to configure a password for hydra as I had to do for mpd?
> >
> > Thank you in advance
> > Miguel Ángel
> >
> >  > From: thakur at mcs.anl.gov
> >  > Date: Sat, 22 Oct 2011 15:30:26 -0500
> >  > To: mpich-discuss at mcs.anl.gov
> >  > Subject: Re: [mpich-discuss] MPID_nem_tcp_connpoll(1826):
> > Communication error with rank 0: Connection refused
> >  >
> >  > Make sure the 5 machines can communicate with each other, i.e., there
> > is no firewall preventing connections.
> >  >
> >  > Rajeev
> >  >
> >  > On Oct 22, 2011, at 12:36 PM, Miguel Angel Fernández wrote:
> >  >
> >  > > Hello everybody
> >  > >
> >  > > I'm trying to fix a problem that appear when I execute one of the
> > mpich2 program examples.
> >  > > As you can see, if I execute a normal command there are no
> > problems. The cluster work properly.
> >  > >
> >  > > mpi at mpi0:~$ mpiexec -f ./mpich2-install/machinefile -n 5 hostname
> >  > > mpi0
> >  > > mpi2
> >  > > mpi3
> >  > > mpi1
> >  > > mpi4
> >  > > mpi at mpi0:~$
> >  > >
> >  > > but when I try to execute the program, the results are something
> > like this
> >  > >
> >  > > mpi at mpi0:~$ mpiexec -f ./mpich2-install/machinefile -n 5
> > /home/mpi/mpich2-install/workspace/Prueba/Debug/Prueba
> >  > > Hello MPI World the original.
> >  > > Hello MPI World the original.
> >  > > Hello MPI World the original.
> >  > > Hello MPI World the original.
> >  > > Hello MPI World the original.
> >  > > From process 0: Num processes: 5
> >  > > Fatal error in MPI_Send: Other MPI error, error stack:
> >  > > MPI_Send(173)..............: MPI_Send(buf=0xbfcbe268, count=26,
> > MPI_CHAR, dest=0, tag=0, MPI_COMM_WORLD) failed
> >  > > MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
> > Connection refused
> >  > > Fatal error in MPI_Send: Other MPI error, error stack:
> >  > > MPI_Send(173)..............: MPI_Send(buf=0xbfb32ca8, count=26,
> > MPI_CHAR, dest=0, tag=0, MPI_COMM_WORLD) failed
> >  > > MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
> > Connection refused
> >  > > Fatal error in MPI_Send: Other MPI error, error stack:
> >  > > MPI_Send(173)..............: MPI_Send(buf=0xbfa49e98, count=26,
> > MPI_CHAR, dest=0, tag=0, MPI_COMM_WORLD) failed
> >  > > MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
> > Connection refused
> >  > > Fatal error in MPI_Send: Other MPI error, error stack:
> >  > > MPI_Send(173)..............: MPI_Send(buf=0xbfa57538, count=26,
> > MPI_CHAR, dest=0, tag=0, MPI_COMM_WORLD) failed
> >  > > MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
> > Connection refused
> >  > >
> >  > > Do you have any idea what can be the problem?
> >  > >
> >  > > Thank you in advance
> >  > > Miguel Angel
> >  > >
> >  > > _______________________________________________
> >  > > mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> >  > > To manage subscription options or unsubscribe:
> >  > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >  >
> >  > _______________________________________________
> >  > mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> >  > To manage subscription options or unsubscribe:
> >  > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
 		 	   		   		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.out
Type: application/octet-stream
Size: 115008 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure_root.out
Type: application/octet-stream
Size: 115792 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.out
Type: application/octet-stream
Size: 115274 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_install.out
Type: application/octet-stream
Size: 89159 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_install_root.out
Type: application/octet-stream
Size: 103421 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_root.out
Type: application/octet-stream
Size: 45525 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111104/6909f143/attachment-0011.obj>


More information about the mpich-discuss mailing list