Document Process Update

Ivan R. Judson judson at mcs.anl.gov
Tue Nov 25 17:03:41 CST 2003


Bob any status on this? I'd like to get it done soon and get it off our
collective plates.

I've been mulling over your event service concerns, and it just occurred to
me that we did discuss high-bandwith data back when you, Tom and I
originally designed it in the library. At that point we decided it was for
low-bandwidth communication and that applications needing more than that
should setup their own communication for higher-performance. If that became
an issue we'd revisit it, but I think currently only rasmol has this issue,
so I'd prefer to wait to revisit it. The event service (and text service)
have stress tests built in and having been passing them for at least a
month.

At any rate, the security design document is a top priority, can you give an
estimate of when it can be done?

--Ivan

> -----Original Message-----
> From: owner-ag-dev at mcs.anl.gov 
> [mailto:owner-ag-dev at mcs.anl.gov] On Behalf Of Ivan R. Judson
> Sent: Monday, November 03, 2003 11:36 AM
> To: 'Robert Olson'; ag-dev at mcs.anl.gov
> Subject: RE: Document Process Update
> 
> 
> 
> > >I'm anxious to have them done.
> > 
> > No, it won't be done then.
> 
> Darn. Can I get an estimate of when you think it will be done?
>  
> > I think that if we're declaring we can't make any more
> > process until these 
> > are done, the ensuing spare time can go toward making what's 
> > there now solid.
> 
> That's what we're doing at some level; as well as sc prep. 
> The issue is that we (historically) have pretty bad judgement 
> about what improvements will drastically alter the behavior 
> of the system. That's why I'd like even seemingly innocuous 
> changes to be cast against the design documents ahead of time 
> so we have some analytical effort going into identifying 
> ramifications of modifications. This *could* (but might not) 
> help us get better at estimating the effect of our changes.
>  
> > I'm worried about the stability of the base system still;
> > we're seeing a 
> > lot of TVS restarts being required, and folks still seem to 
> > be generally 
> > having problems using the AG2 software. I know I have a pile 
> > of things I 
> > want to do with the security/cert mgmt side of things before 
> > progressing on 
> > to anything drastically new. And these aren't deep 
> > design-related issues, 
> > they are detail-oriented engineering issues that I need to 
> > make right to 
> > make things work well for the users.
> 
> Agreed, we have one confirmed performance improvement; we're 
> going to look at applying that to the servers in place now so 
> we can differentiate between inordinately long soap 
> conversions and real hangs.
> 
> The current hangs are not evoking any tracebacks or 
> exceptions. They are silent. That is a big problem. We'll be 
> poking at that (and have been) to find the problem.
> 
> > I also have concerns about the use of the event channel.
> > Since everything 
> > depends on it, it really needs to be rock solid, and it 
> > apparently is not 
> > (text client hangs, etc). If it's being affected by SOAP.py-related 
> > slowdowns, perhaps we need to investigate moving the event 
> > service to its 
> > own process, and ensure that the code is dead simple and dead 
> > solid. Or 
> > perhaps we need to move away from relying on the event 
> > service for basic 
> > operation, using some notion of soft-state registration on 
> > the clients 
> > instead of the existence of active TCP connections via the 
> > event service.
> 
> Yeah, the asynchronous event service has some issues (as did 
> the synchronous one). We need to look at this. But what you 
> are proposing is a major design change. Therefore it needs to 
> be written up and we need to chat about it as a group so we 
> can ensure the modification moves us in the direction of improvement. 
> 
> One point that I think is valueable though, is that the 
> current system has been working well the past few weeks. 
> There are a few server hangs (among all the servers), text 
> blocked (but didn't die) last week once. But other than that 
> it's been much more stable then previously. We're moving up 
> hill, sometimes it's slower than I like.
> 
> --Ivan
> 




More information about the ag-dev mailing list