[AG-DEV] Parallelize some of the functions of the Start-Up for the Venue Client

Thomas D. Uram turam at mcs.anl.gov
Mon Oct 2 13:25:48 CDT 2006


This concerns me for a couple reasons:

- Which bridges does a new user on a new venue server get?  Perhaps none.
Or maybe new venue servers default to using the Argonne bridge, but I
really dislike hard-coding service URLs into the installers).

- If the WestGrid bridges are down, or multicast to them is problematic, 
is the
user just stuck?  They should certainly have the option of using some other
bridges, too.

- This arrangement assumes that the users are "near" to the venue 
server, and
would therefore like to use bridges near the venue server.  Every user 
of the
Argonne venue server would use the Argonne bridges, even though they
might prefer to use their local bridges (especially if they are very distant
from Argonne).

These concerns could be alleviated by:

- allowing venue servers to specify a set of bridge networks to query
- having the p2p bridge network do some localization relative to the
venue clients, so they get bridges reasonably close to them (in network
terms)
- allowing clients to customize their bridge selection (with the 
downside that this
requires manual intervention by new users).

This conversation is very good and helpful; let's keep it up.

Tom



On 9/8/06 5:54 PM, Brian Corrie wrote:
> Hi all,
>
> I like this idea as well - associating a registry with a venue server 
> seems to make the most sense to me. Then when one goes to the Argonne 
> venue server, they use the Argonne associated bridges, but when they 
> use the WestGrid server they get our registry. This makes life much 
> easier for the end users as they don't have to change their settings, 
> the venue server knows where the registry is...
>
> Gets my vote!
>
> Brian
>
>
> Andrew A Rowley wrote:
>> Hi,
>>
>> One thought I had would be to have the server give the client the 
>> registry url.  This would allow people who usually use one venue 
>> server to use a different venue server, including the associated 
>> bridges, without having to change their settings. 
>> Currently, if I decide to put our bridge on our own separate 
>> registry, and someone that usually uses the vv3 server wants to use 
>> our server, they would have to manually go in and specify the new 
>> bridge registry so that they could use our bridge (either through a 
>> new UI interface, or through editing the file).  This adds an extra 
>> complication to the process.
>>
>> Andrew :)
>>
>> ============================================
>> Access Grid Support Centre,
>> RSS Group,
>> Manchester Computing,
>> Kilburn Building,
>> University of Manchester,
>> Oxford Road,
>> Manchester, M13 9PL, UK
>> Tel: +44(0)161-275 0685
>> Email: Andrew.Rowley at manchester.ac.uk
>>> -----Original Message-----
>>> From: owner-ag-dev at mcs.anl.gov [mailto:owner-ag-dev at mcs.anl.gov] On 
>>> Behalf
>>> Of Thomas D. Uram
>>> Sent: 07 September 2006 23:35
>>> To: Jason Bell
>>> Cc: ag-dev at mcs.anl.gov
>>> Subject: Re: [AG-DEV] Parallelize some of the functions of the Start-Up
>>> for the Venue Client
>>>
>>>
>>> I'll clarify and suggest some things.  Feel free to comment or suggest
>>> alternatives.
>>>
>>> The base problem is well understood:  The single universal bridge 
>>> network
>>> can be joined by anyone, including people whose bridge machines are 
>>> behind
>>> a firewall.  Before 3.0.2, VenueClients pinged bridge machines and 
>>> had a
>>> set timeout of one second, so even if this problem had occurred 
>>> before, it
>>> was not noticeable (no one reported it, anyway).  As of 3.0.2, 
>>> 'ping' has
>>> been swapped out for an RPC call to the bridge, and the timeout was
>>> inadvertently not carried over.  I've put together a modified
>>> RegistryClient.py
>>> that includes a one-second timeout; interested people can look here:
>>>
>>> http://www.mcs.anl.gov/~turam/ag3/registry/RegistryClient.py
>>>
>>> This fix should potentially be rolled out right away to overcome the
>>> problem with bridges behind firewalls.  I'd be interested to know if
>>> people
>>> verify this fix, so we can move ahead with it.
>>>
>>> We shouldn't have to worry about the list of bridges growing to an
>>> unscalable number:  VenueClients request a max of 10 bridges
>>> from the registry.  This imposes a number of limitations, several of
>>> which we'll address in the coming development.
>>>
>>> Having clients time out is essential.  It has always been our plan 
>>> to have
>>> the bridge network supported in more of a p2p style than the current
>>> registry.  This would remove the single point of failure that is now 
>>> the
>>> registry.  With this p2p model, there would be no central registry to
>>> determine whether
>>> a bridge is suitable (i.e., not behind a firewall), so allowing 
>>> clients to
>>> quickly measure connectivity to a bridge and be able to timeout is
>>> required.
>>> There's a fair amount of interesting work related to p2p here; if 
>>> someone
>>> is knowledgeable and interested, let us know.
>>>
>>> It has also been our plan to let people run alternate bridge networks,
>>> and configure venue clients to use that alternate bridge network.
>>> Eventually it might make sense for them to also be able to use multiple
>>> bridge networks (e.g., the overarching AG bridge network, plus the
>>> private bridge network established within their institution).
>>>
>>> I'll try to get our plans written up and sent to the list in the near
>>> future,
>>> so they're open for comment before and during implementation.
>>>
>>> Tom
>>>
>>>
>>> On 9/5/06 9:07 AM, Jason Bell wrote:
>>>> G'day all
>>>>
>>>> I think I should mention that while testing my own AG 3 Bridge, I was
>>>> one of those "baddies" as well.
>>>>
>>>> What this highlighted is how easily it would be for someone to 
>>>> "simply"
>>>> create a "baddy" bridge without realising it.
>>>>
>>>> The purpose for my testing was to add additional documentation on my
>>>> install guide on how to configure a unicast bridge and Venue Server 
>>>> for
>>>> AG 3.  I am very reluctant to release any documentation that shows how
>>>> to configure a Bridge as it may inadvertently create more "baddies",
>>>> thus causing the AG3 VenueClient to be almost unusable due to the long
>>>> start-up time.
>>>>
>>>> Anyway, some constructive comments in my opinion on ways we could
>>>> possibly improve this would be:
>>>>
>>>> *    Having the Load_Bridge() run/execute as a separate
>>>> process/thread in which would operate independent of the starting 
>>>> up of
>>>> the VenueClient itself.  The benefits of this could be:
>>>>     -    Running as a separate process shouldn't affect the
>>>> performance of starting the Venue Client, etc.
>>>>     -    Also, you could re-run this function separately which
>>>> would continually update the latest list of bridges.  As I have found
>>>> recently that the list of bridges is only as accurate to when the
>>>> VenueClient was first started.
>>>>
>>>> *    Another idea, based upon Rhys's suggestion is that if something
>>>> doesn't respond after a short delay (rather than a time out), then 
>>>> don't
>>>> list the bridge.  This would then only list bridges that would be
>>>> acceptable to use.  The downside of this is that the list of bridges
>>>> "WILL" grow and still cause some delay, though not as long.
>>>>
>>>> *    Based upon Andrew's suggestion, having the ability for a Bridge
>>>> to Register to different registries, would allow (I think) the bridges
>>>> to be assigned to various regions.  That way, a registry could list
>>>> "Good" unicast bridges for the various regions, cutting down the 
>>>> number
>>>> of bridges tested and loaded.
>>>>
>>>>  I honestly think that the possible solution to this problem is most
>>>> likely a combination of some of the suggestions above, and possibly 
>>>> some
>>>> other ideas.
>>>>
>>>> Anyway, I think this is a very important issue and hopefully we may be
>>>> able to come up with some real "fixes" to the problem.
>>>>
>>>> Cheers,
>>>> Jason.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Christoph Willing [mailto:willing at vislab.uq.edu.au]
>>>> Sent: Tuesday, 5 September 2006 4:54 PM
>>>> To: Rhys Hawkins
>>>> Cc: Jason Bell; ag-dev at mcs.anl.gov
>>>> Subject: Re: [AG-DEV] Parallelize some of the functions of the 
>>>> Start-Up
>>>> for the Venue Client
>>>>
>>>>
>>>> On 04/09/2006, at 3:45 PM, Rhys Hawkins wrote:
>>>>
>>>> [snip]
>>>>
>>>>> I've commented out line 125 in AccessGrid/Registry/
>>>>> RegistryClient.py, ie:
>>>>>     #self.bridges = self._sortBridges(maxToReturn)
>>>>> I don't know whether this will help your colleague or not. It
>>>>> certainly
>>>>> makes things quicker for me. If you're just testing a local bridge
>>>>> then
>>>>> you could just stick you local bridge description in the beginning
>>>>> of the
>>>>> list to fake the sort. Although this is just hacking for testing
>>>>> purposes
>>>>> and doesn't solve the actual problem.
>>>>>
>>>>
>>>> I was doing something similar (VenueClient.py at line 199, comment
>>>> out self.__LoadBridges()), which is fine if you don't need bridges.
>>>> Today I needed to connect to a site whose only visibility was through
>>>> a bridge they were running, so I had to reinstate them, but block the
>>>> baddies. I now have a list of offending bridges inserted at the
>>>> beginning of the PingBridgeService() definition (line 63) in
>>>> RegistryClient.py. These sites are skipped as follows:
>>>>
>>>> def PingBridgeService(self, bridgeProxy):
>>>>      banned = ['some.site', 'another.site']
>>>>      for b in banned:
>>>>          if bridgeProxy._ServerProxy__host.startswith(b):
>>>>          return -1
>>>>      self.log.info('PingBridgeService: trying %s' %
>>>> bridgeProxy._ServerProxy__host)
>>>>      etc. as before
>>>>
>>>> The extra logging line shows progress through the bridges a bit
>>>> better and identifies new baddies.
>>>>
>>>>
>>>> Of course its cumbersome to edit RegistryClient.py everytime a new
>>>> baddy is detected (there have been a few recently), but I generally
>>>> have a fast start up now, as well as having access to "good" bridges.
>>>> Maybe a separate configuration file containing the baddies would be
>>>> better; the VenueClient could consult it at startup before processing
>>>> the bridge list.
>>>>
>>>>
>>>> chris
>>>>
>>>>
>>>> Christoph Willing                       +61 7 3365 8350
>>>> QCiF/QPSF Access Grid Manager
>>>> University of Queensland
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
>




More information about the ag-dev mailing list