[AG-TECH]Question About Node Backup/Fault Tolerance
John I Quebedeaux Jr
johnq at lsu.edu
Mon Oct 31 11:20:52 CST 2005
Our full node (4 machines) is designed for some fault tolerance in
mind in terms of equipment but mostly because we have a portable node
that can take up the slack if hardware failures occur and we built in
a "5th" machine for "just in case".
Our two capture machines have the ability to each do video and audio
capture so that if one fails, the other can take up the slack. We've
actually had to do this before when we had a failure (OS) on one of
the capture machines. Took about 15 minutes to realize we couldn't
recover it in time and switch things over (because I had had the
student practice this about 3 weeks prior).
The echo canceler can temporarily be relieved by our 4 channel echo
canceler on our pig node if the echo canceler itself fails.
Our pig node can function in place of the display node should it fail.
We have a 5th machine with a couple of spare capture cards that
currently serves as our venue server, but it is ready at a moment's
notice to be a video capture/audio machine as well.
If we bring in everything into play - we actually have (from our full
node with our portable as supplement) 14 video captures, 4 audio
captures, two echo cancelers, 4 speakers, 4 projectors, two 61"
plasmas, and 8 microphones (2 wireless) to bring to bear. Mostly
because our portable node has a full complement of audio and video to
go with it. Granted, we can over load it's bus pretty easily if we
really want to, but it serves as redundancy to the full node and it's
own node in it's own right for auditoriums and small conference rooms
with the ability to run 2 projectors in addition to it's own two
operator displays (or 1 display and 3 projectors.)
I've found that with the AG 2.x toolkit it is extremely easy to be
flexible with regards to what resources you want to bring in. I have
a two room node run by 6 machines - and the rooms are nearby. The
xap800 runs BOTH ROOMS. Why? because it's designed to run many rooms
in what it can really do. I split up the displays and the captures -
and even run two service managers (hard coded one for separate port)
on one of the machines where i run video for one room and audio for
the other (splitting up it's load between the two rooms). Essentially
it's two 3 machine nodes thrown together in the same rack - we've
utilized one of the rooms as a result as an overflow room locally
when we had too many in the audience to fit into one room. Quite
flexible indeed. I've run ALL the nodes on campus as one huge node...
just to try it. We have 3 full nodes on campus (going to double very
soon) and one portable.
John I. Quebedeaux, Jr.; Louisiana State University
Computer Manager LBRN; 131 Life Sciences Bldg.
e-mail: johnq at lsu.edu; web: http://lbrn.lsu.edu
phone: 225-578-0062 / fax: 225-578-2597
On Oct 31, 2005, at 9:22 AM, John Langkals wrote:
> Hello AGTech,
> How do you support fault tolerance within your Access Grid node?
> If you would experience catastrophic failure of your node hardware,
> what kind of backup have you designed into your production nodes to
> maintain service?
> Thank you,
> John Langkals
> Systems Manager
> M2021 Physics Research Building
> 191 West Woodruff Avenue
> Columbus, Ohio 43210
> 614.292.6957 Office
> 614.327.3732 Cell
> 614.292.7557 FAX
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ag-tech