[AG-DEV] Parallelize some of the functions of the Start-Up for the Venue Client

Thomas D. Uram turam at mcs.anl.gov
Thu Sep 7 17:35:05 CDT 2006


I'll clarify and suggest some things.  Feel free to comment or suggest
alternatives.

The base problem is well understood:  The single universal bridge network
can be joined by anyone, including people whose bridge machines are behind
a firewall.  Before 3.0.2, VenueClients pinged bridge machines and had a
set timeout of one second, so even if this problem had occurred before, it
was not noticeable (no one reported it, anyway).  As of 3.0.2, 'ping' has
been swapped out for an RPC call to the bridge, and the timeout was
inadvertently not carried over.  I've put together a modified 
RegistryClient.py
that includes a one-second timeout; interested people can look here:

http://www.mcs.anl.gov/~turam/ag3/registry/RegistryClient.py

This fix should potentially be rolled out right away to overcome the
problem with bridges behind firewalls.  I'd be interested to know if people
verify this fix, so we can move ahead with it.

We shouldn't have to worry about the list of bridges growing to an
unscalable number:  VenueClients request a max of 10 bridges
from the registry.  This imposes a number of limitations, several of
which we'll address in the coming development.

Having clients time out is essential.  It has always been our plan to have
the bridge network supported in more of a p2p style than the current
registry.  This would remove the single point of failure that is now the
registry.  With this p2p model, there would be no central registry to 
determine whether
a bridge is suitable (i.e., not behind a firewall), so allowing clients to
quickly measure connectivity to a bridge and be able to timeout is required.
There's a fair amount of interesting work related to p2p here; if someone
is knowledgeable and interested, let us know.

It has also been our plan to let people run alternate bridge networks,
and configure venue clients to use that alternate bridge network.
Eventually it might make sense for them to also be able to use multiple
bridge networks (e.g., the overarching AG bridge network, plus the
private bridge network established within their institution).

I'll try to get our plans written up and sent to the list in the near 
future,
so they're open for comment before and during implementation.

Tom


On 9/5/06 9:07 AM, Jason Bell wrote:
> G'day all
>
> I think I should mention that while testing my own AG 3 Bridge, I was
> one of those "baddies" as well.
>
> What this highlighted is how easily it would be for someone to "simply"
> create a "baddy" bridge without realising it.
>
> The purpose for my testing was to add additional documentation on my
> install guide on how to configure a unicast bridge and Venue Server for
> AG 3.  I am very reluctant to release any documentation that shows how
> to configure a Bridge as it may inadvertently create more "baddies",
> thus causing the AG3 VenueClient to be almost unusable due to the long
> start-up time.
>
> Anyway, some constructive comments in my opinion on ways we could
> possibly improve this would be:
>
> *	Having the Load_Bridge() run/execute as a separate
> process/thread in which would operate independent of the starting up of
> the VenueClient itself.  The benefits of this could be:
> 	-	Running as a separate process shouldn't affect the
> performance of starting the Venue Client, etc.
> 	-	Also, you could re-run this function separately which
> would continually update the latest list of bridges.  As I have found
> recently that the list of bridges is only as accurate to when the
> VenueClient was first started.
>
> *	Another idea, based upon Rhys's suggestion is that if something
> doesn't respond after a short delay (rather than a time out), then don't
> list the bridge.  This would then only list bridges that would be
> acceptable to use.  The downside of this is that the list of bridges
> "WILL" grow and still cause some delay, though not as long.
>
> *	Based upon Andrew's suggestion, having the ability for a Bridge
> to Register to different registries, would allow (I think) the bridges
> to be assigned to various regions.  That way, a registry could list
> "Good" unicast bridges for the various regions, cutting down the number
> of bridges tested and loaded.
>
>  I honestly think that the possible solution to this problem is most
> likely a combination of some of the suggestions above, and possibly some
> other ideas. 
>
> Anyway, I think this is a very important issue and hopefully we may be
> able to come up with some real "fixes" to the problem.
>
> Cheers,
> Jason.
>
>
> -----Original Message-----
> From: Christoph Willing [mailto:willing at vislab.uq.edu.au] 
> Sent: Tuesday, 5 September 2006 4:54 PM
> To: Rhys Hawkins
> Cc: Jason Bell; ag-dev at mcs.anl.gov
> Subject: Re: [AG-DEV] Parallelize some of the functions of the Start-Up
> for the Venue Client
>
>
> On 04/09/2006, at 3:45 PM, Rhys Hawkins wrote:
>
> [snip]
>   
>> I've commented out line 125 in AccessGrid/Registry/ 
>> RegistryClient.py, ie:
>>     #self.bridges = self._sortBridges(maxToReturn)
>> I don't know whether this will help your colleague or not. It  
>> certainly
>> makes things quicker for me. If you're just testing a local bridge  
>> then
>> you could just stick you local bridge description in the beginning  
>> of the
>> list to fake the sort. Although this is just hacking for testing  
>> purposes
>> and doesn't solve the actual problem.
>>     
>
>
> I was doing something similar (VenueClient.py at line 199, comment  
> out self.__LoadBridges()), which is fine if you don't need bridges.  
> Today I needed to connect to a site whose only visibility was through  
> a bridge they were running, so I had to reinstate them, but block the  
> baddies. I now have a list of offending bridges inserted at the  
> beginning of the PingBridgeService() definition (line 63) in  
> RegistryClient.py. These sites are skipped as follows:
>
> def PingBridgeService(self, bridgeProxy):
>      banned = ['some.site', 'another.site']
>      for b in banned:
>          if bridgeProxy._ServerProxy__host.startswith(b):
>          return -1
>      self.log.info('PingBridgeService: trying %s' %  
> bridgeProxy._ServerProxy__host)
>      etc. as before
>
> The extra logging line shows progress through the bridges a bit  
> better and identifies new baddies.
>
>
> Of course its cumbersome to edit RegistryClient.py everytime a new  
> baddy is detected (there have been a few recently), but I generally  
> have a fast start up now, as well as having access to "good" bridges.  
> Maybe a separate configuration file containing the baddies would be  
> better; the VenueClient could consult it at startup before processing  
> the bridge list.
>
>
> chris
>
>
> Christoph Willing                       +61 7 3365 8350
> QCiF/QPSF Access Grid Manager
> University of Queensland
>
>
>
>
>   




More information about the ag-dev mailing list