[hpc-announce] (Call For Papers) ScaDL 2021: Third IPDPS Workshop on Scalable Deep Learning over Parallel and Distributed Infrastructure

Anirban Das dasa2 at rpi.edu
Sun Feb 14 22:46:01 CST 2021

*ScaDL 2021: Third IPDPS Workshop on Scalable Deep Learning over Parallel
and Distributed Infrastructure*https://2021.scadl.org


*Scope of the Workshop*Recently, Deep Learning (DL) has received tremendous
attention in the research community because of the impressive results
obtained for a large number of machine learning problems. The success of
state-of-the-art deep learning systems relies on training deep neural
networks over a massive amount of training data, which typically requires a
large-scale distributed computing infrastructure to run. In order to run
these jobs in a scalable and efficient manner, on cloud infrastructure or
dedicated HPC systems, several interesting research topics have emerged
which are specific to DL. The sheer size and complexity of deep learning
models when trained over a large amount of data makes them harder to
converge in a reasonable amount of time. It demands advancement along
multiple research directions such as, model/data parallelism, model/data
compression, distributed optimization algorithms for DL convergence,
synchronization strategies, efficient communication and specific hardware

*SCADL seeks to advance the following research directions:** Asynchronous
and Communication-Efficient SGD: Stochastic gradient descent is at the core
of large-scale machine learning. Parallelizing SGD gradient computation
across multiple nodes increases the data processed per iteration, but
exposes the SGD to communication and synchronization delays and
unpredictable node failures in the system. Thus, there is a critical need
to design robust and scalable distributed SGD methods to achieve fast
error-convergence in spite of such system variabilities.

* High performance computing aspects: Deep learning is highly compute
intensive. Algorithms for kernel computations on commonly used accelerators
(e.g. GPUs), efficient techniques for communicating gradients and loading
data from storage are critical for training performance.
Model and Gradient Compression Techniques: Techniques such as reducing
weights and the size of weight tensors help in reducing the compute
complexity. Using lower-bit representations allow for more optimal use of
memory and communication bandwidth.

* This intersection of distributed/parallel computing and deep learning is
becoming critical and demands specific attention to address the above
topics which some of the broader forums may not be able to provide. The aim
of this workshop is to foster collaboration among researchers from
distributed/parallel computing and deep learning communities to share the
relevant topics as well as results of the current approaches lying at the
intersection of these areas.

*Areas of Interest*In this workshop, we solicit research papers focused on
distributed deep learning aiming to achieve efficiency and scalability for
deep learning jobs over distributed and parallel systems. Papers focusing
both on algorithms as well as systems are welcome. We invite authors to
submit papers on topics including but not limited to:

- Deep learning on cloud platforms, HPC systems, and edge devices
- Model-parallel and data-parallel techniques
- Asynchronous SGD for Training DNNs
- Communication-Efficient Training of DNNs
- Scalable and distributed graph neural networks Sampling techniques for
graph neural networks
- Federated deep learning, both horizontal and vertical, and its challenges
- Model/data/gradient compression
- Learning in Resource constrained environments
- Coding Techniques for Straggler Mitigation
- Elasticity for deep learning jobs/spot market enablement
- Hyper-parameter tuning for deep learning jobs
- Hardware Acceleration for Deep Learning
- Scalability of deep learning jobs on large clusters
- Deep learning on heterogeneous infrastructure
- Efficient and Scalable Inference
- Data storage/access in shared networks for deep learning

*Format*Due to the continuing impact of COVID-19, ScaDL 2021 will also
adopt relevant IPDPS 2021 policies on virtual participation and
presentation. Consequently, the organizers are currently planning a hybrid
(in-person and virtual) event.

*Submission Link*Please log in to Linklings using this link
an account if necessary). Once you login, you will find a link to
submissions for the ScaDL workshop.

*Key Dates*Paper Submission: February 21, 2021
Acceptance Notification: March  22, 2021
Camera-ready due: April 5, 2021
Workshop: May 21, 2021

*Author Instructions*ScaDL 2021 accepts submissions in three categories:
- Regular papers: 8-10 pages
- Short papers: 4 pages
- Extended abstracts: 1 page
The aforementioned lengths include all technical content, references and
Papers should be formatted using IEEE conference style, including figures,
tables, and references. The IEEE conference style templates for MS Word and
LaTeX provided by IEEE eXpress Conference Publishing are available for
download. See the latest versions at

*General Chairs*Stacy Patterson, RPI, USA
Parijat Dube, IBM Research, USA

*Program Committee Chairs*Yogish Sabharwal, IBM Research, India
Danilo Ardagna, Politecnico di Milano, Italy

*Logistics & Web Chair*Jayaram K. R., IBM Research, USA

*Publicity Chairs*Federica Filippini, Politecnico di Milano, Italy
Anirban Das, RPI, USA

*Program Committee*See the workshop website https://2021.scadl.org

*Steering Committee*Vijay K. Garg, University of Texas at Austin
Vinod Muthusamy, IBM Research AI
Ashish Verma, IBM Research AI

We welcome submissions to ScaDL 2021 and would be glad to address any
questions you may have.

Anirban Das and Federica Filippini
Publicity Chairs

More information about the hpc-announce mailing list