Document Process Update

Robert Olson olson at mcs.anl.gov
Mon Dec 1 16:51:07 CST 2003


committed, doc/CertMgmtDesign.doc.

--bob

At 05:03 PM 11/25/2003, Ivan R. Judson wrote:

>Bob any status on this? I'd like to get it done soon and get it off our
>collective plates.
>
>I've been mulling over your event service concerns, and it just occurred to
>me that we did discuss high-bandwith data back when you, Tom and I
>originally designed it in the library. At that point we decided it was for
>low-bandwidth communication and that applications needing more than that
>should setup their own communication for higher-performance. If that became
>an issue we'd revisit it, but I think currently only rasmol has this issue,
>so I'd prefer to wait to revisit it. The event service (and text service)
>have stress tests built in and having been passing them for at least a
>month.
>
>At any rate, the security design document is a top priority, can you give an
>estimate of when it can be done?
>
>--Ivan
>
> > -----Original Message-----
> > From: owner-ag-dev at mcs.anl.gov
> > [mailto:owner-ag-dev at mcs.anl.gov] On Behalf Of Ivan R. Judson
> > Sent: Monday, November 03, 2003 11:36 AM
> > To: 'Robert Olson'; ag-dev at mcs.anl.gov
> > Subject: RE: Document Process Update
> >
> >
> >
> > > >I'm anxious to have them done.
> > >
> > > No, it won't be done then.
> >
> > Darn. Can I get an estimate of when you think it will be done?
> >
> > > I think that if we're declaring we can't make any more
> > > process until these
> > > are done, the ensuing spare time can go toward making what's
> > > there now solid.
> >
> > That's what we're doing at some level; as well as sc prep.
> > The issue is that we (historically) have pretty bad judgement
> > about what improvements will drastically alter the behavior
> > of the system. That's why I'd like even seemingly innocuous
> > changes to be cast against the design documents ahead of time
> > so we have some analytical effort going into identifying
> > ramifications of modifications. This *could* (but might not)
> > help us get better at estimating the effect of our changes.
> >
> > > I'm worried about the stability of the base system still;
> > > we're seeing a
> > > lot of TVS restarts being required, and folks still seem to
> > > be generally
> > > having problems using the AG2 software. I know I have a pile
> > > of things I
> > > want to do with the security/cert mgmt side of things before
> > > progressing on
> > > to anything drastically new. And these aren't deep
> > > design-related issues,
> > > they are detail-oriented engineering issues that I need to
> > > make right to
> > > make things work well for the users.
> >
> > Agreed, we have one confirmed performance improvement; we're
> > going to look at applying that to the servers in place now so
> > we can differentiate between inordinately long soap
> > conversions and real hangs.
> >
> > The current hangs are not evoking any tracebacks or
> > exceptions. They are silent. That is a big problem. We'll be
> > poking at that (and have been) to find the problem.
> >
> > > I also have concerns about the use of the event channel.
> > > Since everything
> > > depends on it, it really needs to be rock solid, and it
> > > apparently is not
> > > (text client hangs, etc). If it's being affected by SOAP.py-related
> > > slowdowns, perhaps we need to investigate moving the event
> > > service to its
> > > own process, and ensure that the code is dead simple and dead
> > > solid. Or
> > > perhaps we need to move away from relying on the event
> > > service for basic
> > > operation, using some notion of soft-state registration on
> > > the clients
> > > instead of the existence of active TCP connections via the
> > > event service.
> >
> > Yeah, the asynchronous event service has some issues (as did
> > the synchronous one). We need to look at this. But what you
> > are proposing is a major design change. Therefore it needs to
> > be written up and we need to chat about it as a group so we
> > can ensure the modification moves us in the direction of improvement.
> >
> > One point that I think is valueable though, is that the
> > current system has been working well the past few weeks.
> > There are a few server hangs (among all the servers), text
> > blocked (but didn't die) last week once. But other than that
> > it's been much more stable then previously. We're moving up
> > hill, sometimes it's slower than I like.
> >
> > --Ivan
> >




More information about the ag-dev mailing list