[hpc-announce] CFP - DEADLINE EXTENDED (17 Nov 2023): 6th AccML Workshop at HiPEAC 2024

Tue Nov 7 03:34:10 CST 2023

==================================================================
6th Workshop on Accelerated Machine Learning (AccML)

Co-located with the HiPEAC 2024 Conference
(https://www.hipeac.net/2024/munich/)

January 17, 2024
Munich, Germany
==================================================================

-------------------------------------------------------------------------
CALL FOR CONTRIBUTIONS
-------------------------------------------------------------------------
The remarkable performance achieved in a variety of application areas 
(natural language processing, computer vision, games, etc.) has led to 
the emergence of heterogeneous architectures to accelerate machine 
learning workloads. In parallel, production deployment, model complexity 
and diversity pushed for higher productivity systems, more powerful 
programming abstractions, software and system architectures, dedicated 
runtime systems and numerical libraries, deployment and analysis tools. 
Deep learning models are generally memory and computationally intensive, 
for both training and inference. Accelerating these operations has 
obvious advantages, first by reducing the energy consumption (e.g. in 
data centers), and secondly, making these models usable on smaller 
devices at the edge of the Internet. In addition, while convolutional 
neural networks have motivated much of this effort, numerous 
applications and models involve a wider variety of operations, network 
architectures, and data processing. These applications and models 
permanently challenge computer architecture, the system stack, and 
programming abstractions. The high level of interest in these areas 
calls for a dedicated forum to discuss emerging acceleration techniques 
and computation paradigms for machine learning algorithms, as well as 
the applications of machine learning to the construction of such systems.

-------------------------------------------------------------------------
LINKS TO THE WORKSHOP PAGES
-------------------------------------------------------------------------
Organizers: https://accml.dcs.gla.ac.uk/

HiPEAC: https://www.hipeac.net/2024/munich/#/program/sessions/8090/

-------------------------------------------------------------------------
TOPICS
-------------------------------------------------------------------------
Topics of interest include (but are not limited to):

- Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
- Software ML acceleration: languages, primitives, libraries, compilers 
and frameworks;
- Novel ML hardware accelerators and associated software;
- Emerging semiconductor technologies with applications to ML hardware 
acceleration;
- ML for the construction and tuning of systems;
- Cloud and edge ML computing: hardware and software to accelerate 
training and inference;
- Computing systems research addressing the privacy and security of 
ML-dominated systems;
- ML techniques for more efficient model training and inference (e.g. 
sparsity, pruning, etc);
- Generative AI and their impact on computational resources

-------------------------------------------------------------------------
INVITED SPEAKERS
-------------------------------------------------------------------------

- *Giuseppe Desoli (STMicroelectronics)*: Revolutionizing Edge AI: 
Enabling Ultra-low-power and High-performance Inference with In-memory 
Computing Embedded NPUs

_Abstract_: The increasing demand for Edge AI has led to the development 
of complex cognitive applications on edge devices, where energy 
efficiency and compute density are crucial. While HW Neural Processing 
Units (NPUs) have already shown considerable benefits, the growing need 
for more complex algorithms demands significant improvements. To address 
the limitations of traditional Von Neumann architectures, novel designs 
based on computational memories are being developed by industry and 
academia. In this talk, we present STMicroelectronics' future directions 
in designing NPUs that integrate digital and analog In-Memory Computing 
(IMC) technology with high-efficiency dataflow inference engines capable 
of accelerating a wide range of Deep Neural Networks (DNNs). Our 
approach combines SRAM computational memory and phase change resistive 
memories, and we discuss the architectural considerations and 
purpose-designed compiler mapping algorithms required for practical 
industrial applications and some challenges we foresee in harnessing the 
potential of In-memory Computing going forward.

- *John Kim (KAIST)*: Domain-Specific Networks for Accelerated Computing

_Abstract_: Domain-specific architectures are hardware computing engine 
that is specialized for a particular application domain. As 
domain-specific architectures become widely used, the interconnection 
network can become the bottleneck for the system as the system scales. 
In this talk, I will present the role of domain-specific interconnection 
networks to enable scalable domain-specific architectures. In 
particular, I will present the impact of the physical/logical topology 
of the interconnection network on communication such as AllReduce in 
domain-specific systems. I will also discuss the opportunity of 
domain-specific interconnection networks and how they can be leveraged 
to optimize overall system performance and efficiency. As a case study, 
I will present the unique design of the Groq software-managed scale-out 
system and how it adopts architectures from high-performance computing 
to enable a domain-specific interconnection network.

- *Adam Paszke (Google)*: A Multi-Platform High-Productivity Language 
for Accelerator Kernels

_Abstract_: Compute accelerators are the workhorses of modern scientific 
computing and machine learning workloads. But, their ever increasing 
performance also comes at a cost of increasing micro-architectural 
complexity. Worse, it happens at a speed that makes it hard for both 
compilers and low-level kernel authors to keep up. At the same time, the 
increased complexity makes it even harder for a wider audience to author 
high-performance software, leaving them almost entirely reliant on 
high-level libraries and compilers. In this talk I plan to introduce 
Pallas: a domain specific language embedded in Python and built on top 
of JAX. Pallas is highly inspired by the recent development and success 
of the Triton language and compiler, and aims to present users with a 
high-productivity programming environment that is a minimal extension 
over native JAX. For example, kernels can be implemented using the 
familiar JAX-NumPy language, while a single line of code can be 
sufficient to interface the kernel with a larger JAX program. Uniquely, 
Pallas kernels support a subset of JAX program transformations, making 
it possible to derive a number of interesting operators from a single 
implementation. Finally, based on our experiments, Pallas can be 
leveraged for high-performance code generation not only for GPUs, but 
also for other accelerator architectures such as Google’s TPUs.

- *Ayse Coskun (Boston University)*: ML-Powered Diagnosis of Performance 
Anomalies in Computer Systems

_Abstract_: Today’s large-scale computer systems that serve high 
performance computing and cloud face challenges in delivering 
predictable performance, while maintaining efficiency, resilience, and 
security. Much of computer system management has traditionally relied on 
(manual) expert analysis and policies that rely on heuristics derived 
based on such analysis. This talk will discuss a new path on designing 
ML-powered “automated analytics” methods for large-scale computer 
systems and how to make strides towards a longer term vision where 
computing systems are able to self-manage and improve. Specifically, the 
talk will first cover how to systematically diagnose root causes of 
performance “anomalies”, which cause substantial efficiency losses and 
higher cost. Second, it will discuss how to identify applications 
running on computing systems and discuss how such discoveries can help 
reduce vulnerabilities and avoid unwanted applications. The talk will 
also highlight how to apply ML in a practical and scalable way to help 
understand complex systems, demonstrate methods to help standardize 
study of performance anomalies, discuss explainability of applied ML 
methods in the context of computer systems, and point out future 
directions in automating computer system management.

-------------------------------------------------------------------------
SUBMISSION
-------------------------------------------------------------------------
Papers will be reviewed by the workshop's technical program committee 
according to criteria regarding the submission's quality, relevance to 
the workshop's topics, and, foremost, its potential to spark discussions 
about directions, insights, and solutions in the context of accelerating 
machine learning. Research papers, case studies, and position papers are 
all welcome.

In particular, we encourage authors to submit work-in-progress papers: 
To facilitate sharing of thought-provoking ideas and high-potential 
though preliminary research, authors are welcome to make submissions 
describing early-stage, in-progress, and/or exploratory work in order to 
elicit feedback, discover collaboration opportunities, and spark 
productive discussions.

The workshop does not have formal proceedings.

-------------------------------------------------------------------------
IMPORTANT DATES
-------------------------------------------------------------------------
Submission deadline: November 17, 2023
Notification of decision: December 8, 2023

-------------------------------------------------------------------------
ORGANIZERS
-------------------------------------------------------------------------
José Cano (University of Glasgow)
Valentin Radu (University of Sheffield)
José L. Abellán (University of Murcia)
Marco Cornero (DeepMind)
Ulysse Beaugnon (Google)
Juliana Franco (DeepMind)