[AG-TECH]Question About Node Backup/Fault Tolerance

John I Quebedeaux Jr johnq at lsu.edu
Mon Oct 31 11:20:52 CST 2005


Our full node (4 machines) is designed for some fault tolerance in  
mind in terms of equipment but mostly because we have a portable node  
that can take up the slack if hardware failures occur and we built in  
a "5th" machine for "just in case".

Our two capture machines have the ability to each do video and audio  
capture so that if one fails, the other can take up the slack. We've  
actually had to do this before when we had a failure (OS) on one of  
the capture machines. Took about 15 minutes to realize we couldn't  
recover it in time and switch things over (because I had had the  
student practice this about 3 weeks prior).

The echo canceler can temporarily be relieved by our 4 channel echo  
canceler on our pig node if the echo canceler itself fails.
Our pig node can function in place of the display node should it fail.
We have a 5th machine with a couple of spare capture cards that  
currently serves as our venue server, but it is ready at a moment's  
notice to be a video capture/audio machine as well.

If we bring in everything into play - we actually have (from our full  
node with our portable as supplement) 14 video captures, 4 audio  
captures, two echo cancelers, 4 speakers, 4 projectors, two 61"  
plasmas, and 8 microphones (2 wireless) to bring to bear. Mostly  
because our portable node has a full complement of audio and video to  
go with it. Granted, we can over load it's bus pretty easily if we  
really want to, but it serves as redundancy to the full node and it's  
own node in it's own right for auditoriums and small conference rooms  
with the ability to run 2 projectors in addition to it's own two  
operator displays (or 1 display and 3 projectors.)

I've found that with the AG 2.x toolkit it is extremely easy to be  
flexible with regards to what resources you want to bring in. I have  
a two room node run by 6 machines - and the rooms are nearby. The  
xap800 runs BOTH ROOMS. Why? because it's designed to run many rooms  
in what it can really do. I split up the displays and the captures -  
and even run two service managers (hard coded one for separate port)  
on one of the machines where i run video for one room and audio for  
the other (splitting up it's load between the two rooms). Essentially  
it's two 3 machine nodes thrown together in the same rack - we've  
utilized one of the rooms as a result as an overflow room locally  
when we had too many in the audience to fit into one room. Quite  
flexible indeed. I've run ALL the nodes on campus as one huge node...  
just to try it. We have 3 full nodes on campus (going to double very  
soon) and one portable.

-John Q.

John I. Quebedeaux, Jr.; Louisiana State University
Computer Manager LBRN; 131 Life Sciences Bldg.
e-mail: johnq at lsu.edu; web: http://lbrn.lsu.edu
phone: 225-578-0062 / fax: 225-578-2597

On Oct 31, 2005, at 9:22 AM, John Langkals wrote:

> Hello AGTech,
> How do you support fault tolerance within your Access Grid node?   
> If you would experience catastrophic failure of your node hardware,  
> what kind of backup have you designed into your production nodes to  
> maintain service?
> Thank you,
> John
> John Langkals
> Systems Manager
> M2021 Physics Research Building
> 191 West Woodruff Avenue
> Columbus, Ohio 43210
> 614.292.6957 Office
> 614.327.3732 Cell
> 614.292.7557 FAX
> www.octs.osu.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/ag-tech/attachments/20051031/c170cb95/attachment.htm>

More information about the ag-tech mailing list