[Swift-devel] Re: Worker connection
Mihael Hategan
hategan at mcs.anl.gov
Fri Aug 13 12:52:33 CDT 2010
On Fri, 2010-08-13 at 11:23 -0600, wilde at mcs.anl.gov wrote:
> If we set internal hostname to $(hostname -f) and then the worker just
> connects to that, resolving the IP address via DNS, won't that
> typically connect? At least as the default?
Yes, it would. Though no prize for "typically".
>
> Then users manually override for any clusters that are not
> sufficiently sanely configured to make that possible, and we provide a
> set of manual instructions for users to determine the right settings
> if this is the case and what to set if not.
Right. I think this is very similar to what used to be the default.
Mihael
>
> see below.
>
> - Mike
>
> login2$ hostname -f
> login2.pads.ci.uchicago.edu
> login2$ qsub -I
> qsub: waiting for job 444923.svc.pads.ci.uchicago.edu to start
> qsub: job 444923.svc.pads.ci.uchicago.edu ready
>
> ----------------------------------------
> Begin PBS Prologue Fri Aug 13 12:21:44 CDT 2010
> Job ID: 444923.svc.pads.ci.uchicago.edu
> Username: wilde
> Group: ci-users
> Nodes: c40.pads.ci.uchicago.edu
> End PBS Prologue Fri Aug 13 12:21:44 CDT 2010
> ----------------------------------------
> c40$ ping login2.pads.ci.uchicago.edu
> PING login2.pads.ci.uchicago.edu (192.5.86.6) 56(84) bytes of data.
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=1 ttl=64 time=0.099 ms
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=2 ttl=64 time=0.185 ms
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=3 ttl=64 time=0.221 ms
> 64 bytes from login2.pads.ci.uchicago.edu (192.5.86.6): icmp_seq=4 ttl=64 time=0.164 ms
>
> --- login2.pads.ci.uchicago.edu ping statistics ---
> 4 packets transmitted, 4 received, 0% packet loss, time 3000ms
> rtt min/avg/max/mdev = 0.099/0.167/0.221/0.045 ms
> c40$ exit
> logout
>
> qsub: job 444923.svc.pads.ci.uchicago.edu completed
> login2$
>
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
>
> > On Fri, 2010-08-13 at 12:59 -0400, Glen Hocky wrote:
> > > The OOPS project uses sed to set that parameter and create the
> > sites
> > > file on the fly. it's very effective
> >
> > How was the IP picked?
> >
> > >
> > > On Fri, Aug 13, 2010 at 12:54 PM, Mihael Hategan
> > <hategan at mcs.anl.gov>
> > > wrote:
> > > On Fri, 2010-08-13 at 11:43 -0500, Jonathan Monette wrote:
> > > > Right now I am using "internalHostname". I was just
> > > wondering if an
> > > > should this be changed since I am always changing this
> > entry
> > > depending
> > > > if I am on login1 or login2?
> > >
> > >
> > > It should, but the question is to what.
> > >
> > > I offer $20 to the first person to find a reliable (that
> > works
> > > on all TG
> > > sites + PADS + Intrepid), quick (that does not, by itself,
> > > delay worker
> > > startup or the overall workflow by more than a few seconds)
> > > and
> > > automated way of figuring out that IP. I reserve the right
> > to
> > > refuse a
> > > solution if it does not meet certain propriety criteria that
> > I
> > > did not
> > > necessarily specify here.
> > >
> > > (btw you could make a wrapper around swift that detects
> > > whether you are
> > > on login1 or login2 and picks one of two sites files and
> > > passes that to
> > > swift).
> > >
> > > Mihael
> > >
> > > >
> > > > On 8/13/10 11:35 AM, Mihael Hategan wrote:
> > > > > On Fri, 2010-08-13 at 11:30 -0500, Jonathan Monette
> > wrote:
> > > > >
> > > > >> Hello,
> > > > >> How does the worker decide what connection to
> > > connect to? Right
> > > > >> now what I think it does is it runs ifconfig and greps
> > > the inet address
> > > > >> and then test each of these connections. Is this
> > > correct? When I am
> > > > >> running on PADS it seems that the worker always
> > chooses
> > > the wrong
> > > > >> connection to the service. It seems to choose the
> > UBS0
> > > connection where
> > > > >> the correct connection is the ib0 connection. Is there
> > a
> > > way that maybe
> > > > >> the worker can be fixed to choose a better connection
> > or
> > > the correct
> > > > >> connection? This seems to be only happening on PADS.
> > > > >>
> > > > >>
> > > > > That was temporary. Initially it would use the same
> > > address as the url
> > > > > in sites.xml. Then I added the "try all interfaces"
> > thing,
> > > but in some
> > > > > cases the connect on certain wrong addresses does not
> > fail
> > > quickly
> > > > > enough and has to timeout instead, which usually takes
> > a
> > > few minutes. So
> > > > > that got disabled and only the frist address is used
> > now
> > > (unless
> > > > > overridden - see below).
> > > > >
> > > > > You can say<profile namespace="globus"
> > > > > key="internalHostname">x.y.z.w</profile> in sites.xml
> > > > >
> > > > > Mihael
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> > >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list