[AG-DEV] Parallelize some of the functions of the Start-Up for the Venue Client

Andrew Rowley Andrew.Rowley at manchester.ac.uk
Tue Oct 3 16:06:35 CDT 2006


Hi,

There is one assumption that is being made with this implementation and using
the unicast bridge closest to you.  This assumption is that all of the bridges
can speak to each other in multicast perfectly.  If this is not the case and
two people connect to two bridges that don't speak to each other, they 
will not
be able to communicate.  If this issue can be resolved, there should not be a
problem with having a single registry (or peer-to-peer connected multiple
registries).

One way of doing this would be to have each bridge connect to the other 
bridges
in a peer-to-peer fashion, with the registries included in the bridges
themselves.  The bridges can then monitor the multicast traffic between
themselves with some specially assigned multicast channel (and the 
peer-to-peer
network to check that traffic is getting through).  If two people connect to
two bridges that can't speak in multicast, the bridges could tunnel to each
other in unicast (e.g. using umtp), creating a fully connected network 
(I think
this is like VRVS reflectors).  Note that this could be useful for temporarily
poor multicast; transfer of packets can be monitored using the peer-to-peer
network and if the loss becomes large, the bridges can switch to umtp 
transport
until the loss becomes less again.  Another cool thing that would happen here
is that if all clients connected to their local bridge and unicast was being
used between the bridges, the link between any two bridges would not transfer
any more traffic than if multicast were being used (I don't think).

Without this method, it is actually better to have each venue having only one
bridge to ensure that all users only join the same bridge and thus ensure
connectivity, although this will increase load on the bridges.  This is the
reason I was suggesting that the venue server should give out the 
registry url.

Only once we can guarantee connectivity between unicast bridges can we talk
about having users use their closest bridge from a large list.  Otherwise you
are going to get a lot of new users just selecting "Use Unicast" and finding
that it doesn't work.

Andrew :)

Quoting "Thomas D. Uram" <turam at mcs.anl.gov>:

> This concerns me for a couple reasons:
>
> - Which bridges does a new user on a new venue server get?  Perhaps none.
> Or maybe new venue servers default to using the Argonne bridge, but I
> really dislike hard-coding service URLs into the installers).
>
> - If the WestGrid bridges are down, or multicast to them is 
> problematic, is the
> user just stuck?  They should certainly have the option of using some other
> bridges, too.
>
> - This arrangement assumes that the users are "near" to the venue server, and
> would therefore like to use bridges near the venue server.  Every user of the
> Argonne venue server would use the Argonne bridges, even though they
> might prefer to use their local bridges (especially if they are very distant
> from Argonne).
>
> These concerns could be alleviated by:
>
> - allowing venue servers to specify a set of bridge networks to query
> - having the p2p bridge network do some localization relative to the
> venue clients, so they get bridges reasonably close to them (in network
> terms)
> - allowing clients to customize their bridge selection (with the 
> downside that this
> requires manual intervention by new users).
>
> This conversation is very good and helpful; let's keep it up.
>
> Tom
>
>
>
> On 9/8/06 5:54 PM, Brian Corrie wrote:
>> Hi all,
>>
>> I like this idea as well - associating a registry with a venue 
>> server seems to make the most sense to me. Then when one goes to the 
>> Argonne venue server, they use the Argonne associated bridges, but 
>> when they use the WestGrid server they get our registry. This makes 
>> life much easier for the end users as they don't have to change 
>> their settings, the venue server knows where the registry is...
>>
>> Gets my vote!
>>
>> Brian
>>
>>
>> Andrew A Rowley wrote:
>>> Hi,
>>>
>>> One thought I had would be to have the server give the client the 
>>> registry url.  This would allow people who usually use one venue 
>>> server to use a different venue server, including the associated 
>>> bridges, without having to change their settings. Currently, if I 
>>> decide to put our bridge on our own separate registry, and someone 
>>> that usually uses the vv3 server wants to use our server, they 
>>> would have to manually go in and specify the new bridge registry so 
>>> that they could use our bridge (either through a new UI interface, 
>>> or through editing the file).  This adds an extra complication to 
>>> the process.
>>>
>>> Andrew :)
>>>
>>> ============================================
>>> Access Grid Support Centre,
>>> RSS Group,
>>> Manchester Computing,
>>> Kilburn Building,
>>> University of Manchester,
>>> Oxford Road,
>>> Manchester, M13 9PL, UK
>>> Tel: +44(0)161-275 0685
>>> Email: Andrew.Rowley at manchester.ac.uk
>>>> -----Original Message-----
>>>> From: owner-ag-dev at mcs.anl.gov [mailto:owner-ag-dev at mcs.anl.gov] On Behalf
>>>> Of Thomas D. Uram
>>>> Sent: 07 September 2006 23:35
>>>> To: Jason Bell
>>>> Cc: ag-dev at mcs.anl.gov
>>>> Subject: Re: [AG-DEV] Parallelize some of the functions of the Start-Up
>>>> for the Venue Client
>>>>
>>>>
>>>> I'll clarify and suggest some things.  Feel free to comment or suggest
>>>> alternatives.
>>>>
>>>> The base problem is well understood:  The single universal bridge network
>>>> can be joined by anyone, including people whose bridge machines are behind
>>>> a firewall.  Before 3.0.2, VenueClients pinged bridge machines and had a
>>>> set timeout of one second, so even if this problem had occurred before, it
>>>> was not noticeable (no one reported it, anyway).  As of 3.0.2, 'ping' has
>>>> been swapped out for an RPC call to the bridge, and the timeout was
>>>> inadvertently not carried over.  I've put together a modified
>>>> RegistryClient.py
>>>> that includes a one-second timeout; interested people can look here:
>>>>
>>>> http://www.mcs.anl.gov/~turam/ag3/registry/RegistryClient.py
>>>>
>>>> This fix should potentially be rolled out right away to overcome the
>>>> problem with bridges behind firewalls.  I'd be interested to know if
>>>> people
>>>> verify this fix, so we can move ahead with it.
>>>>
>>>> We shouldn't have to worry about the list of bridges growing to an
>>>> unscalable number:  VenueClients request a max of 10 bridges
>>>> from the registry.  This imposes a number of limitations, several of
>>>> which we'll address in the coming development.
>>>>
>>>> Having clients time out is essential.  It has always been our plan to have
>>>> the bridge network supported in more of a p2p style than the current
>>>> registry.  This would remove the single point of failure that is now the
>>>> registry.  With this p2p model, there would be no central registry to
>>>> determine whether
>>>> a bridge is suitable (i.e., not behind a firewall), so allowing clients to
>>>> quickly measure connectivity to a bridge and be able to timeout is
>>>> required.
>>>> There's a fair amount of interesting work related to p2p here; if someone
>>>> is knowledgeable and interested, let us know.
>>>>
>>>> It has also been our plan to let people run alternate bridge networks,
>>>> and configure venue clients to use that alternate bridge network.
>>>> Eventually it might make sense for them to also be able to use multiple
>>>> bridge networks (e.g., the overarching AG bridge network, plus the
>>>> private bridge network established within their institution).
>>>>
>>>> I'll try to get our plans written up and sent to the list in the near
>>>> future,
>>>> so they're open for comment before and during implementation.
>>>>
>>>> Tom
>>>>
>>>>
>>>> On 9/5/06 9:07 AM, Jason Bell wrote:
>>>>> G'day all
>>>>>
>>>>> I think I should mention that while testing my own AG 3 Bridge, I was
>>>>> one of those "baddies" as well.
>>>>>
>>>>> What this highlighted is how easily it would be for someone to "simply"
>>>>> create a "baddy" bridge without realising it.
>>>>>
>>>>> The purpose for my testing was to add additional documentation on my
>>>>> install guide on how to configure a unicast bridge and Venue Server for
>>>>> AG 3.  I am very reluctant to release any documentation that shows how
>>>>> to configure a Bridge as it may inadvertently create more "baddies",
>>>>> thus causing the AG3 VenueClient to be almost unusable due to the long
>>>>> start-up time.
>>>>>
>>>>> Anyway, some constructive comments in my opinion on ways we could
>>>>> possibly improve this would be:
>>>>>
>>>>> *    Having the Load_Bridge() run/execute as a separate
>>>>> process/thread in which would operate independent of the starting up of
>>>>> the VenueClient itself.  The benefits of this could be:
>>>>>     -    Running as a separate process shouldn't affect the
>>>>> performance of starting the Venue Client, etc.
>>>>>     -    Also, you could re-run this function separately which
>>>>> would continually update the latest list of bridges.  As I have found
>>>>> recently that the list of bridges is only as accurate to when the
>>>>> VenueClient was first started.
>>>>>
>>>>> *    Another idea, based upon Rhys's suggestion is that if something
>>>>> doesn't respond after a short delay (rather than a time out), then don't
>>>>> list the bridge.  This would then only list bridges that would be
>>>>> acceptable to use.  The downside of this is that the list of bridges
>>>>> "WILL" grow and still cause some delay, though not as long.
>>>>>
>>>>> *    Based upon Andrew's suggestion, having the ability for a Bridge
>>>>> to Register to different registries, would allow (I think) the bridges
>>>>> to be assigned to various regions.  That way, a registry could list
>>>>> "Good" unicast bridges for the various regions, cutting down the number
>>>>> of bridges tested and loaded.
>>>>>
>>>>>  I honestly think that the possible solution to this problem is most
>>>>> likely a combination of some of the suggestions above, and possibly some
>>>>> other ideas.
>>>>>
>>>>> Anyway, I think this is a very important issue and hopefully we may be
>>>>> able to come up with some real "fixes" to the problem.
>>>>>
>>>>> Cheers,
>>>>> Jason.
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Christoph Willing [mailto:willing at vislab.uq.edu.au]
>>>>> Sent: Tuesday, 5 September 2006 4:54 PM
>>>>> To: Rhys Hawkins
>>>>> Cc: Jason Bell; ag-dev at mcs.anl.gov
>>>>> Subject: Re: [AG-DEV] Parallelize some of the functions of the Start-Up
>>>>> for the Venue Client
>>>>>
>>>>>
>>>>> On 04/09/2006, at 3:45 PM, Rhys Hawkins wrote:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> I've commented out line 125 in AccessGrid/Registry/
>>>>>> RegistryClient.py, ie:
>>>>>>     #self.bridges = self._sortBridges(maxToReturn)
>>>>>> I don't know whether this will help your colleague or not. It
>>>>>> certainly
>>>>>> makes things quicker for me. If you're just testing a local bridge
>>>>>> then
>>>>>> you could just stick you local bridge description in the beginning
>>>>>> of the
>>>>>> list to fake the sort. Although this is just hacking for testing
>>>>>> purposes
>>>>>> and doesn't solve the actual problem.
>>>>>>
>>>>>
>>>>> I was doing something similar (VenueClient.py at line 199, comment
>>>>> out self.__LoadBridges()), which is fine if you don't need bridges.
>>>>> Today I needed to connect to a site whose only visibility was through
>>>>> a bridge they were running, so I had to reinstate them, but block the
>>>>> baddies. I now have a list of offending bridges inserted at the
>>>>> beginning of the PingBridgeService() definition (line 63) in
>>>>> RegistryClient.py. These sites are skipped as follows:
>>>>>
>>>>> def PingBridgeService(self, bridgeProxy):
>>>>>      banned = ['some.site', 'another.site']
>>>>>      for b in banned:
>>>>>          if bridgeProxy._ServerProxy__host.startswith(b):
>>>>>          return -1
>>>>>      self.log.info('PingBridgeService: trying %s' %
>>>>> bridgeProxy._ServerProxy__host)
>>>>>      etc. as before
>>>>>
>>>>> The extra logging line shows progress through the bridges a bit
>>>>> better and identifies new baddies.
>>>>>
>>>>>
>>>>> Of course its cumbersome to edit RegistryClient.py everytime a new
>>>>> baddy is detected (there have been a few recently), but I generally
>>>>> have a fast start up now, as well as having access to "good" bridges.
>>>>> Maybe a separate configuration file containing the baddies would be
>>>>> better; the VenueClient could consult it at startup before processing
>>>>> the bridge list.
>>>>>
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> Christoph Willing                       +61 7 3365 8350
>>>>> QCiF/QPSF Access Grid Manager
>>>>> University of Queensland
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>






More information about the ag-dev mailing list