[hpc-announce] Call For Papers: ScaDL 2021: Third IPDPS Workshop on Scalable Deep Learning overParallel and Distributed Infrastructure

Fri Nov 13 23:06:17 CST 2020

*ScaDL 2021: Third IPDPS Workshop on Scalable Deep Learning over*
*Parallel and Distributed Infrastructure*
https://2021.scadl.org

**CALL FOR PAPERS**
============================================================
*Scope of the Workshop*
Recently, Deep Learning (DL) has received tremendous attention in the
research community because of the impressive results obtained for a
large number of machine learning problems. The success of state-of-the-art
deep learning systems relies on training deep neural networks over a massive
amount of training data, which typically requires a large-scale distributed
computing infrastructure to run. In order to run these jobs in a scalable
and
efficient manner, on cloud infrastructure or dedicated HPC systems, several
interesting research topics have emerged which are specific to DL. The sheer
size and complexity of deep learning models when trained over a large amount
of data makes them harder to converge in a reasonable amount of time. It
demands
advancement along multiple research directions such as, model/data
parallelism,
model/data compression, distributed optimization algorithms for DL
convergence,
synchronization strategies, efficient communication and specific hardware
acceleration.

*SCADL seeks to advance the following research directions:*
* Asynchronous and Communication-Efficient SGD: Stochastic gradient descent
is
at the core of large-scale machine learning. Parallelizing SGD gradient
computation across multiple nodes increases the data processed per
iteration,
but exposes the SGD to communication and synchronization delays and
unpredictable node failures in the system. Thus, there is a critical need to
design robust and scalable distributed SGD methods to achieve fast
error-convergence in spite of such system variabilities.

* High performance computing aspects: Deep learning is highly compute
intensive.
Algorithms for kernel computations on commonly used accelerators (e.g.
GPUs),
efficient techniques for communicating gradients and loading data from
storage
are critical for training performance.

*Model and Gradient Compression Techniques: Techniques such as reducing
weights
and the size of weight tensors help in reducing the compute complexity.
Using
lower-bit representations allow for more optimal use of memory and
communication
bandwidth.

This intersection of distributed/parallel computing and deep learning is
becoming critical and demands specific attention to address the above topics
which some of the broader forums may not be able to provide. The aim of this
workshop is to foster collaboration among researchers from
distributed/parallel
computing and deep learning communities to share the relevant topics as
well as
results of the current approaches lying at the intersection of these areas.

*Areas of Interest*
In this workshop, we solicit research papers focused on distributed deep
learning aiming to achieve efficiency and scalability for deep learning jobs
over distributed and parallel systems. Papers focusing both on algorithms as
well as systems are welcome. We invite authors to submit papers on topics
including but not limited to:

-Deep learning on cloud platforms, HPC systems, and edge devices
- Model-parallel and data-parallel techniques
- Asynchronous SGD for Training DNNs
- Communication-Efficient Training of DNNs
- Scalable and distributed graph neural networks, sampling techniques for
graph
neural networks
- Federated deep learning, both horizontal and vertical, and its challenges
- Model/data/gradient compression
- Learning in Resource constrained environments
- Coding Techniques for Straggler Mitigation
- Elasticity for deep learning jobs/spot market enablement
- Hyper-parameter tuning for deep learning jobs
- Hardware Acceleration for Deep Learning
- Scalability of deep learning jobs on large clusters
- Deep learning on heterogeneous infrastructure
- Efficient and Scalable Inference
- Data storage/access in shared networks for deep learning

*Format*
Due to the continuing impact of COVID-19, ScaDL 2021 will also adopt
relevant
IPDPS 2021 policies on virtual participation and presentation. Consequently,
the organizers are currently planning a hybrid (in-person and virtual)
event.

*Key Dates*
*Paper Submission: February 1, 2021*
*Acceptance Notification: March  15, 2021*
*Camera-ready due: March 30, 2021*
*Workshop: May 21, 2021*

*Author Instructions*
ScaDL 2021 accepts submissions in three categories:
Regular papers: 8-10 pages
Short papers: 4 pages
Extended abstracts: 1 page
The aforementioned lengths include all technical content, references and
appendices.
Papers should be formatted using IEEE conference style, including figures,
tables, and
references. The IEEE conference style templates for MS Word and LaTeX
provided by
IEEE eXpress Conference Publishing are available for download. See the
latest versions
at https://www.ieee.org/conferences/publishing/templates.html

*General Chairs*
Parijat Dube, IBM Research, USA
Stacy Patterson, RPI, USA

*Program Committee Chairs*
Danilo Ardagna, Politecnico di Milano, Italy
Yogish Sabharwal, IBM Research, India

*Logistics & Web Chair*
Jayaram K. R., IBM Research, USA

*Publicity Chairs*
Anirban Das, RPI, USA
Federica Filippini, Politecnico di Milano, Italy

*Program Committee*
See the workshop website https://2021.scadl.org

*Steering Committee*
Vinod Muthusamy, IBM Research, USA
Ashish Verma, IBM Research, USA

============================================================
We welcome submissions to ScaDL 2021 and would be glad to address any
questions you may have.

Sincerely,
Anirban Das and Federica Filippini
Publicity Chairs