[AG-TECH] Bridge Traffic/Router Issues

Mike Weaver weaver at ascr.doe.gov
Mon Jul 20 04:13:30 CDT 2009


I have some logging info showing the dropping of MSDP peers and some PIM
problems, but no historic NetFlow data or instantaneous connection,
bandwidth or CPU performance data.  Our management system does 5 minute
averaging where the CPU spiking is not evident.  I'll send the logs in a
separate message so as not to burden the list.

Mike

> -----Original Message-----
> From: Thomas Uram [mailto:turam at mcs.anl.gov]
> Sent: Sunday, July 19, 2009 21:21
> To: weaver at ascr.doe.gov
> Cc: ag-tech
> Subject: Re: [AG-TECH] Bridge Traffic/Router Issues
> 
> I am curious to look at the logfiles. The client does periodically
> connect to bridges and momentarily receive traffic from them. Do you
> have a record of the events on the router for comparison with the
> venue client logs?
> 
> On Jul 18, 2009, at 6:23 AM, "Mike Weaver" <weaver at ascr.doe.gov> wrote:
> 
> > OK, here's a weird one.  AG 3.2b1 on Fedora 11.  Started the venue
> > client w/
> > default settings (in particular, unicast mode).  Went to the ANL
> venue
> > server lobby.  Had some issues w/ RAT & D-Bus.  Did some
> > investigation and
> > then got side-tracked with other tasks.  Sometime later I got bumped
> > from
> > the venue server, but the client was still running.  Sometime later
> > still
> > (sorry about the uncertainty of the timing) we started noticing
> > timeout
> > issues on our LAN.  Web browsing would hang at various stages (DNS,
> > connecting..., transferring...) for 20-30 seconds, every 10-15
> > minutes.
> > Sometimes long enough to timeout the connection.  Clicking refresh
> > would
> > bring the page up fine.  One other possibly relevant detail; I was
> > running
> > the venue client as root.
> >
> > Users also started complaining about getting disconnected when
> > remoted in
> > (RDP & Citrix).  Long story-short, we tracked it down to CPU
> > utilization
> > spikes on our router (Cisco 7206, FastEthernet PIC).  The CPU
> > utilization
> > (sometimes as high as 100%) was almost entirely from interrupts, no
> > process
> > or memory issues.  While investigating NetFlow data, I notice a
> > large number
> > of flows to the Auckland University bridge server in New Zealand.  I
> > found &
> > exited the running venue client and the network problems stopped.  No
> > guarantee that the venue client was to blame, but the problem
> > persisted
> > steadily for almost a week, and hasn't reoccurred in over 2 days
> > coincident
> > with exiting the venue client.  Pretty strong circumstantial
> > evidence IMHO.
> > Looking at the venue client logs, the start of the network issues
> > corresponded almost exactly with entering the lobby.  There are a
> > number of
> > errors related to contacting bridges, but most are due to DNS issues
> > as we
> > block a lot of central & southeast Asian networks
> > (.ru, .kr, .cn, .tw, .hk,
> > etc...).  If someone is interested (hint hint Tom), I can provide
> > the logs,
> > but their pretty big due the length of time the client was running
> > so I
> > didn't want to post them to the list.
> >
> > Any idea what kind of traffic might be flowing between the venue
> > client and
> > a bridge in these circumstances?  Anything periodic on the order of
> > every
> > 10-15 minutes?  We're trying to characterize the data and determine
> > why the
> > router was having issues with it.
> >
> > I'd appreciate any thoughts or ideas that anyone might have on this.
> >
> > Mike
> >
> > --
> > Mike Weaver
> > US Department of Energy
> > ASCR/SC-21.1
> > Germantown Building
> > Voice: 301-903-0072
> > Fax: 301-903-7774
> > Email: weaver at ascr.doe.gov
> >
> >
> >
> >



More information about the ag-tech mailing list