[Swift-devel] [dsl-seminar] This week seminar on Fault Tolerance Computing
Allan Espinosa
aespinosa at cs.uchicago.edu
Sat May 16 18:04:28 CDT 2009
Hi all,
For this week's seminar, we invited Rinku Gupta from ANL to share their group's
on fault tolerant systems. Below is the information for the talk
Title: Moving towards a Coordinated Infrastructure for Fault Tolerant Systems
Speaker: Rinku Gupta, Argonne MCS
Date: Thursday May 21, 2009
Time: 4:30pm
Venue: RI 405
Abstract:
The need for leadership class fault-tolerance has steadily increased
and continues to increase as emerging high performance systems move
towards offering petascale level performance. While most high-end
systems do provide mechanisms for detection, notification and perhaps
handling of hardware and software related faults, the individual
components present in the system perform these actions separately.
Knowledge about occurring faults is seldom shared between different
programs and almost never on a system-wide basis. A typical system
contains numerous programs that could benefit from such knowledge,
include applications, middleware libraries, job schedulers, file
systems, math libraries, monitoring software, operating systems,
and check pointing software.
The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)
initiative provides the foundation necessary to enable systems to
adapt to faults in a holistic manner. CIFTS achieves this through
the Fault Tolerance Backplane (FTB), providing a unified management
and communication framework, which can be used by any program to
publish fault-related information. In this talk, I will present
some of the work done by the CIFTS group towards the development
of FTB and FTB-enabled components.
You can also checkout their groups page in
http://www.mcs.anl.gov/research/cifts/
See you guys on thursday,
-Allan
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list