[hpc-announce] CFP: SUSCOM Special Issue on Resilience and/or Energy-aware techniques for High-Performance Computing (RE-HPC)

Hongyang Sun hongyang.sun at vanderbilt.edu
Fri Mar 31 09:43:13 CDT 2017

Special Issue of Sustainable Computing: Informatics and Systems (SUSCOM) on
Resilience and/or Energy-aware techniques for High-Performance Computing (RE



Resilience and energy consumption have become two important concerns for
high-performance computing (HPC) systems. With the increasing core count
and technology miniaturization, today's large computing platforms
(datacenters, clusters, supercomputers, etc.) are increasingly prone to
failures. Faults are becoming norm rather than exception. Besides the
classical fail-stop errors (such as hardware failures), soft errors (such
as SDCs for silent data corruptions) constitute another threat that can no
longer be ignored by the HPC community. Another concern is energy.
Presently, large computing centers are among the largest consumers of
energy, hence measures must be taken to reduce energy consumption. Energy
is needed not only to power the individual cores but also to provide
cooling for the system. In today's datacenters, a large proportion of
energy is spent on cooling and thermal-related activities. It is
anticipated that the power dissipated to perform communications and I/O
transfers will also make up a much larger share of the overall power
consumption. The relative cost of communication is expected to increase
dramatically, both in terms of latency/overhead and of consumed
energy. Re-designing
algorithms for HPC systems to ensure resilience and to reduce energy
consumption will be crucial to achieving sustained performance. The link
between resilience and energy must also be carefully tackled. Better
resilience often requires redundancy (replication and/or checkpointing,
rollback and recovery), which consumes extra energy. Hot cores may lead to
less resilient computing or increase the probability of individual
failures. On the other hand, reducing the energy consumption via
voltage/frequency scaling techniques will increase the application running
time, and hence the expected number of failures during execution.

This Special Issue will encompass a broad range of topics related to
resilience and energy efficiency for HPC. Its objective is to
facilitate exchange of valuable information and ideas among researchers and
practitioners. Topics of interest include (but are not limited to):

●      Fault-tolerant algorithms, tools, and protocols

●      Checkpointing, replication, and recovery techniques

●      Detection and prediction of soft errors and SDCs

●      System reliability, testing, and verification

●      Resilience models, algorithms, and simulations

●      Energy-efficient scheduling and resource management

●      Power-aware runtime systems

●      Energy-efficient I/O, storage, and networking

●      Thermal behavior modeling, control and management

●      Cooling-aware optimizations and evaluations

●      Tradeoffs between performance, reliability, energy and temperature


General information for submitting papers to SUSCOM can be found at
http://www.journals.elsevier.com/sustainable-computing (please note the
“Guide for Authors” link).  Submissions to this Special Issue (SI) should
be made using Elsevier's editorial system at the journal website (under the
“submit your paper” link).  Please make sure to select the “SI: RE-HPC”
option for the type of the paper during the submission process.  All
submissions must be original and may not be under review. A submission
based on one or more papers that appeared elsewhere has to include major
value-added extensions over what appeared previously (at least 30% new
conceptual material). Authors are requested to attach to the submitted
paper such earlier articles and a summary document explaining the
enhancements made in the journal version. All submitted papers will be
peer-reviewed using the normal standards of SUSCOM.


●   Manuscript due date: May 1, 2017

●   First decision notification: August 1, 2017

●   Tentative publication schedule: December- 2017


Anne Benoit, ENS-Lyon, France

Jean-Marc Pierson, University of Toulouse, France

Hongyang Sun, Vanderbilt University, USA

Any question may be sent to hongyang.sun at vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/hpc-announce/attachments/20170331/8c1e91e0/attachment.html>

More information about the hpc-announce mailing list