[hpc-announce] FTXS 2023 @ SC23 Call for Participation

Levy, Scott Larson Nicoll sllevy at sandia.gov
Wed Nov 1 09:00:00 CDT 2023


CALL FOR PARTICIPATION
13th Workshop on Fault-Tolerance for HPC at eXtreme Scale (FTXS 2023)
Sunday, November 12, 2023 2:00pm-5:30pm (MST)

In conjunction with The International Conference for
High Performance Computing, Networking, Storage, and Analysis (SC23)
Denver, Colorado, USA November 12 - 17, 2023
https://sites.google.com/view/ftxs2023

We encourage you to join us on Sunday, November 12, 2023 for the 13th edition of the FTXS workshop.

WORKSHOP PROGRAM
==================
[2:00-2:01pm] Opening remarks
[2:01-3:00pm] Featured Speaker
Quantum Computing Reliability: Problems, Tools, and Potential Solutions
Professor Paolo Rech (Universita di Trento)

Abstract: Quantum computing is a new computational paradigm, expected to revolutionize the computing field in the next few years. Qubits, the atomic units of a quantum circuit, exploit the quantum physics properties to increase the parallelism and speed of computation. Unfortunately, qubits are both intrinsically noisy and highly susceptible to external sources of faults, such as ionizing radiation. The reported qubits error rate is so high that researchers are questioning the large-scale adoption of quantum computers and forces unpractical mitigation solutions such as installing the quantum computer in underground caves.Innovative solutions to improve the reliability of quantum applications are then highly necessary.

In the talk, after providing all information and background needed to understand quantum computing basics and an overview of the available quantum technologies vulnerabilities, we will present the available hardening solutions and the open challenges that need to be addressed. We will consider both the intrinsic noise, that has a predictable and incremental effect, and radiation-induced transient faults, that are stochastic and modify the qubit in an unpredictable way. Based on the latest studies and radiation experiments performed on real quantum machines, we will show how to model the transient faults in a qubit and how to inject this fault in a quantum circuit to track its propagation. We will discuss the vulnerability of qubits and of circuits, identifying the most critical parts and the main course for output corruption. Finally, we will provide an overview of the open (reliability) challenges in quantum computing to stimulate further studies and solutions.

[3:00-3:30pm] SC23 Afternoon Break

[3:30-3:55pm] Regular Paper 1
Optimizing Write Performance for Checkpointing to Parallel File Systems Using LSM-Trees
Bulut, Wright

[3:55-4:20pm] Regular Paper 2
Recovery from Silent Data Corruption via Spatial Data Prediction
Guernsey, Placke, Poulos, Calhoun

[4:20-4:40pm] Short Paper Lightning Talks
* Disk Failure Trends in Alpine Storage System 
  (George, Hanley, Oral)
* Using Benford's Law to Identify Unusual Failure Regions 
  (Ferreira, Levy)
* Dynamic Selective Protection of Sparse Iterative Solvers via ML Prediction of Soft Error Impacts 
   (Chen, Verrecchia, Sun, Booth, Raghavan

[4:40-5:05pm] Regular Paper 3
Evaluating the Resiliency of Posits for Scientific Computing
Schlueter, Calhoun, Poulos

[5:05-5:29pm] Regular Paper 4
When to checkpoint at the end of a fixed-length reservation?
Barbut, Benoit, Herault, Robert, Vivien

 [5:29-5:30pm] Closing remarks

QUESTIONS?  Contact Scott Levy (sllevy at sandia.gov)


More information about the hpc-announce mailing list