[Swift-devel] Re: Worker connection

wilde at mcs.anl.gov wilde at mcs.anl.gov
Fri Aug 13 12:23:15 CDT 2010


If we set internal hostname to $(hostname -f) and then the worker just connects to that, resolving the IP address via DNS, won't that typically connect? At least as the default?

Then users manually override for any clusters that are not sufficiently sanely configured to make that possible, and we provide a set of manual instructions for users to determine the right settings if this is the case and what to set if not.

see below.

- Mike

login2$ hostname -f
login2.pads.ci.uchicago.edu
login2$ qsub -I
qsub: waiting for job 444923.svc.pads.ci.uchicago.edu to start
qsub: job 444923.svc.pads.ci.uchicago.edu ready

----------------------------------------
Begin PBS Prologue Fri Aug 13 12:21:44 CDT 2010
Job ID:		444923.svc.pads.ci.uchicago.edu
Username:	wilde
Group:		ci-users
Nodes:		c40.pads.ci.uchicago.edu
End PBS Prologue Fri Aug 13 12:21:44 CDT 2010
----------------------------------------
c40$ ping login2.pads.ci.uchicago.edu
PING login2.pads.ci.uchicago.edu (192.5.86.6) 56(84) bytes of data.
64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=1 ttl=64 time=0.099 ms
64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=2 ttl=64 time=0.185 ms
64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=3 ttl=64 time=0.221 ms
64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=4 ttl=64 time=0.164 ms

--- login2.pads.ci.uchicago.edu ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.099/0.167/0.221/0.045 ms
c40$ exit
logout

qsub: job 444923.svc.pads.ci.uchicago.edu completed
login2$ 

----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:

> On Fri, 2010-08-13 at 12:59 -0400, Glen Hocky wrote:
> > The OOPS project uses sed to set that parameter and create the
> sites
> > file on the fly. it's very effective
> 
> How was the IP picked?
> 
> > 
> > On Fri, Aug 13, 2010 at 12:54 PM, Mihael Hategan
> <hategan at mcs.anl.gov>
> > wrote:
> >         On Fri, 2010-08-13 at 11:43 -0500, Jonathan Monette wrote:
> >         > Right now I am using "internalHostname".  I was just
> >         wondering if an
> >         > should this be changed since I am always changing this
> entry
> >         depending
> >         > if I am on login1 or login2?
> >         
> >         
> >         It should, but the question is to what.
> >         
> >         I offer $20 to the first person to find a reliable (that
> works
> >         on all TG
> >         sites + PADS + Intrepid), quick (that does not, by itself,
> >         delay worker
> >         startup or the overall workflow by more than a few seconds)
> >         and
> >         automated way of figuring out that IP. I reserve the right
> to
> >         refuse a
> >         solution if it does not meet certain propriety criteria that
> I
> >         did not
> >         necessarily specify here.
> >         
> >         (btw you could make a wrapper around swift that detects
> >         whether you are
> >         on login1 or login2 and picks one of two sites files and
> >         passes that to
> >         swift).
> >         
> >         Mihael
> >         
> >         >
> >         > On 8/13/10 11:35 AM, Mihael Hategan wrote:
> >         > > On Fri, 2010-08-13 at 11:30 -0500, Jonathan Monette
> wrote:
> >         > >
> >         > >> Hello,
> >         > >>       How does the worker decide what connection to
> >         connect to?  Right
> >         > >> now what I think it does is it runs ifconfig and greps
> >         the inet address
> >         > >> and then test each of these connections.  Is this
> >         correct?  When I am
> >         > >> running on PADS it seems that the worker always
> chooses
> >         the wrong
> >         > >> connection to the service.  It seems to choose the
> UBS0
> >         connection where
> >         > >> the correct connection is the ib0 connection.  Is there
> a
> >         way that maybe
> >         > >> the worker can be fixed to choose a better connection
> or
> >         the correct
> >         > >> connection?  This seems to be only happening on PADS.
> >         > >>
> >         > >>
> >         > > That was temporary. Initially it would use the same
> >         address as the url
> >         > > in sites.xml. Then I added the "try all interfaces"
> thing,
> >         but in some
> >         > > cases the connect on certain wrong addresses does not
> fail
> >         quickly
> >         > > enough and has to timeout instead, which usually takes
> a
> >         few minutes. So
> >         > > that got disabled and only the frist address is used
> now
> >         (unless
> >         > > overridden - see below).
> >         > >
> >         > > You can say<profile namespace="globus"
> >         > > key="internalHostname">x.y.z.w</profile>  in sites.xml
> >         > >
> >         > > Mihael
> >         > >
> >         > >
> >         > >
> >         > >
> >         >
> >         
> >         
> >         
> >         
> >         _______________________________________________
> >         Swift-devel mailing list
> >         Swift-devel at ci.uchicago.edu
> >         http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >         
> > 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list