Observations

Mark Hereld hereld at mcs.anl.gov
Mon Jul 19 18:45:48 CDT 1999


After a slew of nCruises with all of you, and a few demos of various levels of import, we have a few observations to make about what we need to work on.  Call these "Lessons Learned" -- though most follow directly from the material presented at the Tutorial.

1 ----------------------------------------------------------- On MANNING an AG NODE:

It is crucial for the success of the AG and any important demonstration or use of it that each node be sufficiently staffed to cover running the systems and responding to the inevitable "changes of state".  One is too few.  Less than one (i.e. a part-timer) is obviously worse.

Two is a minimum.  One hundred is a little too many.

2 ----------------------------------------------------------- On BACKCHANNEL PRESENCE:

It is very important to pay attention to the mud.  Try to set yourselves up so that the people MANNING the AG NODE can see a mud window and that the mud window contains continuous scrollback from previous sessions.  We are not talking enough on the mud in general (some more or less than others).  And, when the task is debugging (rather than SHOWTIME), we're not talking enough over the audio channel.  [If you don't know what 'scrollback' is and it's purpose, then get in touch with us -- it's VERY important to be able to take best advantage of the mud.]

We all need feedback.  Pretend that we are all in the same room, pouring over the same problem.  If we're debugging a problem, we need to see what you see, hear what you hear, be up-to-date with the state of things that you are trying.  That probably means that you need to configure your room so that we can tell a little bit about what's going on.  It is particularly difficult when you disappear off camera and become silent.

We are trying to spin you up as painlessly as possible.

3 ----------------------------------------------------------- On FLOOR CONTROL:

You must pay particular attention to whichever node has FLOOR CONTROL.  This means deferring to them when settings must be adjusted.  It means paying attention to them on the mud when problems are being debugged (see "On BACKCHANNEL PRESENCE").  Problems will more often than not be solved more quickly.

Although I hesitate to say this, Argonne will be the default FLOOR CONTROL until such burden is passed to another site for a given demonstration or part of a demonstration.

4 ----------------------------------------------------------- On FAULT TOLERANCE:

For various reasons, we've been having problems with audio every now and again.  It hurts the most when it happens during an important demonstration, as in today's.  Much of our woes could be mitigated by one of the following.

-- A first line of defense for a site to protect itself from a node which is having audio problems is to MUTE the audio from the offending site.

-- This doesn't, however, prevent feedback and echoes from creeping in through nodes that haven't muted the offending stream.  The node generating the bad audio may be asked to STOP TRANSMITTING.

-- If the node generating the bad audio isn't paying attention (see "On BACKCHANNEL PRESENCE"), , a way to clear the channel is for EVERYBODY to MUTE the offending stream.

-- And, finally, the node generating the bad audio can be asked to KILL THE RAT.

These last three (and possibly the first) require that somebody controlling the FLOOR issue a COMMAND on the BACKCHANNEL such as "mute SO-N-SO" and that it be HEARD and OBEYED.


-- Mark and Lisa




More information about the ag-tech mailing list