[Swift-devel] Re: Worker connection

Mihael Hategan hategan at mcs.anl.gov
Fri Aug 13 12:52:33 CDT 2010


On Fri, 2010-08-13 at 11:23 -0600, wilde at mcs.anl.gov wrote:
> If we set internal hostname to $(hostname -f) and then the worker just
> connects to that, resolving the IP address via DNS, won't that
> typically connect? At least as the default?

Yes, it would.  Though no prize for "typically".

> 
> Then users manually override for any clusters that are not
> sufficiently sanely configured to make that possible, and we provide a
> set of manual instructions for users to determine the right settings
> if this is the case and what to set if not.

Right. I think this is very similar to what used to be the default.

Mihael

> 
> see below.
> 
> - Mike
> 
> login2$ hostname -f
> login2.pads.ci.uchicago.edu
> login2$ qsub -I
> qsub: waiting for job 444923.svc.pads.ci.uchicago.edu to start
> qsub: job 444923.svc.pads.ci.uchicago.edu ready
> 
> ----------------------------------------
> Begin PBS Prologue Fri Aug 13 12:21:44 CDT 2010
> Job ID:		444923.svc.pads.ci.uchicago.edu
> Username:	wilde
> Group:		ci-users
> Nodes:		c40.pads.ci.uchicago.edu
> End PBS Prologue Fri Aug 13 12:21:44 CDT 2010
> ----------------------------------------
> c40$ ping login2.pads.ci.uchicago.edu
> PING login2.pads.ci.uchicago.edu (192.5.86.6) 56(84) bytes of data.
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=1 ttl=64 time=0.099 ms
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=2 ttl=64 time=0.185 ms
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=3 ttl=64 time=0.221 ms
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=4 ttl=64 time=0.164 ms
> 
> --- login2.pads.ci.uchicago.edu ping statistics ---
> 4 packets transmitted, 4 received, 0% packet loss, time 3000ms
> rtt min/avg/max/mdev = 0.099/0.167/0.221/0.045 ms
> c40$ exit
> logout
> 
> qsub: job 444923.svc.pads.ci.uchicago.edu completed
> login2$ 
> 
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> 
> > On Fri, 2010-08-13 at 12:59 -0400, Glen Hocky wrote:
> > > The OOPS project uses sed to set that parameter and create the
> > sites
> > > file on the fly. it's very effective
> > 
> > How was the IP picked?
> > 
> > > 
> > > On Fri, Aug 13, 2010 at 12:54 PM, Mihael Hategan
> > <hategan at mcs.anl.gov>
> > > wrote:
> > >         On Fri, 2010-08-13 at 11:43 -0500, Jonathan Monette wrote:
> > >         > Right now I am using "internalHostname".  I was just
> > >         wondering if an
> > >         > should this be changed since I am always changing this
> > entry
> > >         depending
> > >         > if I am on login1 or login2?
> > >         
> > >         
> > >         It should, but the question is to what.
> > >         
> > >         I offer $20 to the first person to find a reliable (that
> > works
> > >         on all TG
> > >         sites + PADS + Intrepid), quick (that does not, by itself,
> > >         delay worker
> > >         startup or the overall workflow by more than a few seconds)
> > >         and
> > >         automated way of figuring out that IP. I reserve the right
> > to
> > >         refuse a
> > >         solution if it does not meet certain propriety criteria that
> > I
> > >         did not
> > >         necessarily specify here.
> > >         
> > >         (btw you could make a wrapper around swift that detects
> > >         whether you are
> > >         on login1 or login2 and picks one of two sites files and
> > >         passes that to
> > >         swift).
> > >         
> > >         Mihael
> > >         
> > >         >
> > >         > On 8/13/10 11:35 AM, Mihael Hategan wrote:
> > >         > > On Fri, 2010-08-13 at 11:30 -0500, Jonathan Monette
> > wrote:
> > >         > >
> > >         > >> Hello,
> > >         > >>       How does the worker decide what connection to
> > >         connect to?  Right
> > >         > >> now what I think it does is it runs ifconfig and greps
> > >         the inet address
> > >         > >> and then test each of these connections.  Is this
> > >         correct?  When I am
> > >         > >> running on PADS it seems that the worker always
> > chooses
> > >         the wrong
> > >         > >> connection to the service.  It seems to choose the
> > UBS0
> > >         connection where
> > >         > >> the correct connection is the ib0 connection.  Is there
> > a
> > >         way that maybe
> > >         > >> the worker can be fixed to choose a better connection
> > or
> > >         the correct
> > >         > >> connection?  This seems to be only happening on PADS.
> > >         > >>
> > >         > >>
> > >         > > That was temporary. Initially it would use the same
> > >         address as the url
> > >         > > in sites.xml. Then I added the "try all interfaces"
> > thing,
> > >         but in some
> > >         > > cases the connect on certain wrong addresses does not
> > fail
> > >         quickly
> > >         > > enough and has to timeout instead, which usually takes
> > a
> > >         few minutes. So
> > >         > > that got disabled and only the frist address is used
> > now
> > >         (unless
> > >         > > overridden - see below).
> > >         > >
> > >         > > You can say<profile namespace="globus"
> > >         > > key="internalHostname">x.y.z.w</profile>  in sites.xml
> > >         > >
> > >         > > Mihael
> > >         > >
> > >         > >
> > >         > >
> > >         > >
> > >         >
> > >         
> > >         
> > >         
> > >         
> > >         _______________________________________________
> > >         Swift-devel mailing list
> > >         Swift-devel at ci.uchicago.edu
> > >         http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >         
> > > 
> > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 





More information about the Swift-devel mailing list