[hpc-announce] [CFP] FTXS 2019 @ SC19

Levy, Scott Larson sllevy at sandia.gov
Mon Jun 3 14:51:23 CDT 2019

9th Workshop on Fault-Tolerance for HPC at eXtreme Scale (FTXS 2019)

In conjunction with The International Conference for
High Performance Computing, Networking, Storage, and Analysis (SC19)
Denver, Colorado, USA November 17 - 22, 2019

Important Dates
* Submissions open: July 1, 2019
* Submission of papers: August 27, 2019
* Author notification: September 27, 2019
* Camera-ready papers: TBA

Authors are invited to submit original papers on the research and practice of
fault-tolerance in extreme-scale distributed systems (primarily HPC systems,
but including grid and cloud systems).  Resilience and fault-tolerance remain
a major concern for supercomputing and advances in this area are needed to
allow applications to compute accurate (or within an acceptable error tolerance)
answers in a timely and efficient manner in the presence of degradations or
failures of platform components (both hardware and software).

Topics include, but are not limited to:
* Failure data analysis and field studies
* Power, performance, resilience (PPR) assessments / tradeoffs
* Novel fault-tolerance techniques and implementations
* Emerging hardware and software technology for resilience
* Silent data corruption (SDC) detection / correction techniques
* Advances in reliability monitoring, analysis, and control of  highly
   complex systems
* Failure prediction, error preemption, and recovery techniques
* Fault-tolerant programming models
* Models for software and hardware reliability
* Metrics and standards for measuring, improving, and enforcing
  effective fault-tolerance
* Scalable Byzantine fault-tolerance and security from single-fault and
   fail-silent violations
* Atmospheric evaluations relevant to HPC systems (terrestrial
   neutrons, temperature, voltage, etc.)
* Near-threshold-voltage implications and evaluations for reliability
* Benchmarks and experimental environments including fault injection
* Frameworks and APIs for fault-tolerance and fault management

Submissions are solicited in the following categories:
* Regular papers presenting innovative ideas improving the state of the art or
   discussing the issues seen on existing extreme-scale systems, including some
   form of analysis and evaluation.
* Extended abstracts proposing disruptive ideas and challenging assumptions in
   the field, including some form of preliminary results.

Extended abstracts will be evaluated separately and given shorter oral presentations.

Submissions shall be sent electronically, must conform to SC19 proceedings style.  Regular
papers should not exceed ten (10) pages including all text, appendices, figures, and
references.  Extended abstract papers should not exceed six (6) pages.  Papers should be
submitted at: https://submissions.supercomputing.org.

Scott Levy - Sandia National Laboratories
Nathan DeBardeleben - Los Alamos National Laboratory

Keita Teranishi - Sandia National Laboratories
John Daly - Laboratory for Physical Sciences

Leonardo Bautista-Gomez - Barcelona Supercomputing Center
Aurelien Bouteiller - University of Tennessee
Chris Cantwell - Imperial College, London
Florina M. Ciorba - University of Basel
James Elliott - Sandia National Laboratories
Christian Engelmann - Oak Ridge National Laboratory
Kurt B. Ferreira - Sandia National Laboratories
Wilfried Gansterer - University of Vienna
Qiang Guan - Kent State University
Sudhanva Gurumurthi - Advanced Micro Devices Inc
Zhiling Lan - Illinois Institute of Technology
Naoya Maruyama - Lawrence Livermore National Laboratory
Jackson Mayo - Sandia National Laboratories
Bogdan Nicolae - Argonne National Laboratory
Yves Robert - ENS Lyon, University of Tennessee
Abhinav Vishnu - Advanced Micro Devices (AMD) Inc
Panruo Wu - University of Houston

Questions? Contact Scott Levy (sllevy at sandia.gov).

More information about the hpc-announce mailing list