[MPICH] MPICH1.2.4 _Condor Problem

Natarajan, Senthil senthil at pitt.edu
Tue Mar 13 11:23:33 CDT 2007


Hi,

I followed the link and set the ephemeral port range.

But still it uses different port numbers to connect to other machines. 

Here are the few ports (source, destination) used.

SPT=50912 DPT=0

SPT=50927 DPT=544

SPT=50928 DPT=544

SPT=1023 DPT=514

 

How to control this source and destination port range, for mpich1.2.4 to
connect to other machines.

 

Thanks,

Senthil

 

 

________________________________

From: Rajeev Thakur [mailto:thakur at mcs.anl.gov] 
Sent: Wednesday, March 07, 2007 4:37 PM
To: Natarajan, Senthil; 'Condor-Users Mail List';
mpich-discuss at mcs.anl.gov
Cc: 'Greg Thain'
Subject: RE: [MPICH] MPICH1.2.4 _Condor Problem

 

MPICH doesn't use a fixed port. It uses ports assigned by the machine's
IP stack, which are in the ephemeral port range. See
http://www.ncftpd.com/ncftpd/doc/misc/ephemeral_ports.html

 

Rajeev

	 

	
________________________________


	From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Natarajan, Senthil
	Sent: Wednesday, March 07, 2007 2:13 PM
	To: Condor-Users Mail List; mpich-discuss at mcs.anl.gov
	Cc: Greg Thain
	Subject: [MPICH] MPICH1.2.4 _Condor Problem

	Hi,

	I am submitting MPI jobs, using MPICH1.2.4 through condor.

	I have setup a separate user (something like "condor-user") to
run condor jobs on all the dedicated nodes. I created the certificates
and copied to all the nodes.

	So the user ("condor-user") can ssh with out password, within
all the nodes and to its own node.

	 

	But the job fails and complaining about the connection refused
to the same machine. (I.e) the job runs on Machine A, couldn't not
connect to Machine A.

	 

	Here is the error from one of the node.

	 

	connect to address xxx.xx.xxx.xx: Connection refused

	connect to address xxx.xx.xxx.xx: Connection refused

	trying normal rsh (/usr/bin/rsh)

	MachineA: Connection refused

	 

	p0_20339:  p4_error: Timeout in making connection to remote
process on MachineA: 0

	p0_20339: (301.989178) net_send: could not write to fd=4, errno
= 32

	 

	By default MPI jobs (MPICH1.2.4) runs on what port? so that I
can setup firewall rules. Even I tried to set the port range like this

	MPICH_PORT_RANGE=50001:59999

	And allow the above ports in the firewall rule. But still having
the connection refused problem.

	 

	Could you please let me know what might be the problem?

	 

	Thanks,

	Senthil

	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070313/7f2f86fc/attachment.htm>


More information about the mpich-discuss mailing list