[hpc-announce] CfP 21st Workshop on Virtualization, Containers, and Resource Isolation for Supercomputer AI (VHPC '26)

Tue Mar 31 16:30:35 CDT 2026

==========================================================
CALL FOR PAPERS

21st Workshop on Virtualization, Containers, and
Resource Isolation for Supercomputer AI (VHPC '26)

held in conjunction with the European Conference on
Parallel and Distributed Computing Aug 24-28, 2026, Pisa, Italy.

==========================================================

Paper submission deadline: May 16, 2026 23:59 (AoE)

Date: August 24-25, 2026
Workshop URL: vhpc dot org

To submit an abstract or paper, please follow the link provided
in the Call for Papers (CfP) announcement at the end of this message.

Call for Papers

This year, we highlight containers and virtualization as sandbox
enablers for scaled large language models, memory-intensive LLM
training, and increasingly agentic AI systems that dynamically
orchestrate tools, services, and distributed resources across cloud
and HPC infrastructures.

We invite contributions including, but not limited to:

- Containerized and VM-based environments for large-scale training,
  distributed inference, and multi-agent AI systems, including
  orchestration frameworks such as Kubernetes
- Container-based training data ingest, RLHF/DPO load balancing,
  and dynamic resource shifting for iterative alignment loops.
- Virtualization substrates for agentic execution graphs, tool
  invocation pipelines, and dynamic resource binding across
  heterogeneous clusters
- Advanced autoscaling, scheduling, and event-driven resource
  management for long-running training jobs and bursty agent-driven
  inference workloads
- GPU and accelerator virtualization techniques enabling safe and
  efficient sharing across concurrent training and agentic tasks,
  including MIG partitioning, MPS, and vGPU
- GPU memory virtualization and oversubscription mechanisms for
  high-memory LLM and foundation model workloads
- Unified CPU-GPU memory architectures, flat page tables, and shared
  virtual address spaces for accelerator-intensive AI pipelines
- Distributed and disaggregated memory virtualization for multi-node
  model training and inference, including CXL-attached and
  fabric-attached memory architectures
- Storage-to-memory-mapped data paths and high-throughput dataset
  access mechanisms for large model pipelines
- Memory compression, parameter reduction, quantization, and
  out-of-core techniques to reduce footprint and improve utilization
- Efficient memory allocation, fragmentation control, and runtime
  memory management for multi-tenant AI platforms
- Container-aware high-performance networking, including RDMA, RoCE
  congestion management, and scalable CNI designs for distributed
  AI training
- Performance isolation and mitigation of virtualization noise for
  latency-sensitive inference and agent coordination
- Inference serving infrastructure including GPU multiplexing, model
  sharding, and KV-cache management within virtualized and
  containerized environments
- Benchmarking, profiling, and observability tools for
  memory-intensive and accelerator-bound AI workloads
- Secure isolation, confidential computing (AMD SEV-SNP, Intel TDX,
  GPU TEEs), and trust models for multi-tenant and cross-
  organizational agentic systems
- Real-world case studies of virtualization-enabled LLM training,
  distributed inference, and agentic AI deployments across cloud
  and HPC environments

The Workshop on Virtualization in High-Performance Cloud Computing
(VHPC) aims to bring together researchers and industrial practitioners
facing the challenges posed by virtualization and containerization
in AI-driven HPC and cloud infrastructures, in order to foster
discussion, collaboration, mutual exchange of knowledge and
experience, enabling research to ultimately provide novel solutions
for virtualized computing systems of tomorrow.

Virtualization and container technologies constitute the programmable
substrate of modern AI and HPC infrastructures. In the current AI
era -- characterized by large-scale model training, distributed
inference, multimodal pipelines, and increasingly agentic systems --
controlled and efficient execution across heterogeneous resources
is essential. HPC centers and cloud operators alike must manage
infrastructures composed of CPUs, GPUs, NPUs, high-performance
interconnects, and emerging accelerators, while supporting highly
dynamic and resource-intensive AI workloads. Training jobs may span
thousands of GPUs; inference services demand low latency and strict
performance isolation; agentic systems orchestrate distributed tools
and services across trust domains.

Virtualization technologies provide the mechanisms to meet these
demands. Full machine virtualization enables strong isolation and
consolidation across heterogeneous nodes. Container-based OS-level
virtualization offers lightweight and responsive deployment models
suited for latency-sensitive inference and microservice-based AI
pipelines. Lightweight VMs, microVMs, and unikernels reduce
execution overhead and attack surface while enabling controlled
multi-tenant AI platforms.

Beyond resource isolation, controlled execution is becoming a
first-class concern. Deterministic execution models, state
snapshotting, replay mechanisms, and execution tracing are
increasingly relevant for debugging distributed AI systems, ensuring
reproducibility of training runs, and governing agentic behavior
across heterogeneous infrastructure.

I/O and accelerator virtualization enable efficient sharing of GPUs
and high-speed interconnects, while network virtualization supports
dynamic formation of distributed training clusters and AI execution
graphs across supercomputing and hybrid cloud environments. Emerging
unified memory architectures and accelerator-aware virtualization
further blur traditional system boundaries.

Publication

Accepted papers will be published in a Springer LNCS proceedings volume.

Topics of Interest

The VHPC program committee solicits original, high-quality
submissions on virtualization and containerization technologies as
foundational enablers of AI-driven HPC and cloud infrastructures.
We particularly encourage contributions that address large-scale AI
training, distributed inference, agentic workloads, heterogeneous
accelerators, and secure multi-tenant execution.

Each topic includes aspects of design, architecture, management,
performance modeling, measurement, and tooling.

1. Virtualization Architectures for AI and HPC Systems

- Container and OS-level virtualization for AI training and inference
  in HPC and cloud environments
- Lightweight virtual machines and microVMs for secure and
  low-latency AI services
- Hypervisor support for heterogeneous accelerators including GPUs,
  NPUs, TPUs, FPGAs
- GPU memory virtualization for high-memory LLM and foundation model
  training workloads, including hardware partitioning (MIG),
  multi-process sharing (MPS), time-slicing, and vGPU mechanisms
- Unified and flat CPU-GPU virtual memory models and accelerator
  address space integration
- Virtualization support for high-performance interconnects including
  RDMA and accelerator-aware networking
- Secure isolation models for multi-tenant and agentic AI workloads
  across trust domains, including hardware TEEs (AMD SEV-SNP,
  Intel TDX, ARM CCA) and GPU-based confidential computing
- Unikernels and specialized operating systems for minimal
  attack-surface AI deployment
- Lightweight sandboxed execution environments including WebAssembly
  (WASM/WASI) for portable and isolated AI workloads
- Virtualization extensions for emerging architectures including ARM
  and RISC-V in HPC-AI systems
- Energy-efficient and power-aware virtualization for large-scale AI
  infrastructures

2. Resource Management, Orchestration, and Agentic Execution

- VM and container orchestration for distributed AI and HPC workflows
- Scheduling and placement strategies for GPU-intensive and
  memory-bound AI workloads, including Kubernetes Dynamic Resource
  Allocation (DRA) and topology-aware scheduling
- Autoscaling and event-driven resource management for training,
  inference, and FaaS-based AI services
- Virtualization support for serverless and function-based AI
  execution models
- Agentic workload orchestration across cloud, edge, and HPC
  infrastructures
- Secure multi-cluster and hybrid cloud-HPC integration for AI
  pipelines
- Workflow coupling of simulation, data analytics, and in situ AI
  processing in HPC environments
- Resource sharing and isolation for mixed HPC and AI production
  workloads
- Policy-driven control, admission, and governance for multi-tenant
  AI platforms
- Fault tolerance, live migration, and high-availability mechanisms
  for long-running AI training jobs

3. Performance, Memory Systems, and Tooling for Large-Scale AI

- Performance analysis and modeling of virtualized AI workloads in
  supercomputing and cloud systems
- Scalability studies of containers and VMs for large-scale
  distributed AI training
- Distributed and disaggregated memory virtualization, including
  CXL-based memory pooling and fabric-attached memory for multi-node
  model training
- Memory-efficient techniques including compression, reduction, and
  out-of-core training
- Efficient GPU and accelerator memory allocation, fragmentation
  control, and oversubscription
- Storage and filesystem integration with virtual memory mapped
  approaches for AI datasets
- Deterministic and replayable execution models for distributed AI
  systems
- State snapshotting, time-travel debugging, and execution tracing
- Benchmarking and profiling tools for memory-intensive LLM workloads
- Measurement and mitigation of OS and virtualization noise in
  HPC-AI environments
- Optimization of hypervisors and virtual machine monitors for
  AI-centric workloads
- Case studies demonstrating virtualization-enabled AI and agentic
  systems in HPC and cloud infrastructures

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections, plus
lightning talks that are limited to 5 minutes. Presentations may be
accompanied by interactive demonstrations.

Important Dates

Rolling abstract submission
Papper deadline - May 16, 2026 23:59 (AoE)
Acceptance notification- June 12, 2026
Camera ready - July 10, 2026
Workshop Day August 24-25, 2026

Chair

Michael Alexander (chair), Austrian Academy of Sciences
Anastassios Nanos (co-chair), Nubificus Ltd., UK

Tentative Technical Program Committee

Stergios Anastasiadis, University of Ioannina, Greece
Gabriele Ara, Scuola Superiore Sant'Anna, Italy
Jakob Blomer, CERN, Switzerland
Eduardo Cesar, Universidad Autonoma de Barcelona, Spain
Taylor Childers, Argonne National Laboratory, USA
Francois Diakhate, CEA DAM, France
Roberto Giorgi, University of Siena, Italy
Kyle Hale, Northwestern University, USA
Giuseppe Lettieri, University of Pisa, Italy
Nikos Parlavantzas, IRISA, France
Amer Qouneh, Western New England University, USA
Carlos Reano, Queen's University Belfast, UK
Riccardo Rocha, CERN, Switzerland
Lutz Schubert, University of Ulm, Germany
Jonathan Sparks, Cray, USA
Kurt Tutschku, Blekinge Institute of Technology, Sweden
John Walters, USC ISI, USA
Yasuhiro Watashiba, Osaka University, Japan
Chao-Tung Yang, Tunghai University, Taiwan

Paper Submission-Publication

Papers submitted to the workshop will be reviewed by at least two
members of the program committee and external reviewers. Submissions
should include abstract, keywords, the e-mail address of the
corresponding author, and must not exceed 12 pages, including tables
and figures at a main font size no smaller than 11 points.

Submission of a paper should be regarded as a commitment that, should
the paper be accepted, at least one of the authors will register and
attend the conference to present the work.

Accepted papers will be published in a Springer LNCS volume. Initial
submissions are in PDF; authors of accepted papers will be requested
to provide source files.

Lightning Talks

Lightning Talks are in a non-paper track, synoptical in nature and are
strictly limited to 5 minutes. They can be used to gain early feedback
on ongoing research, for demonstrations, to present research results,
early research ideas, perspectives and positions of interest to the
community. Submit abstracts via the main submission link.

General Information

The workshop will be held in conjunction with the International European
Conference on Parallel and Distributed Computing on Aug 24-28, 2026,
Pisa, Italy.

Please contact ahead of time for presenting remotely via video.

Abstract, Paper Submission Link: https://urldefense.us/v3/__https://edas.info/newPaper.php?c=35100__;!!G_uCfscf7eWS!caIA3Bd5FbDH5mXtpMUm4St2GGaQv_XGkJdT2yvERjslnq5m1TXOpVIUnGo6vx4pjzssPJRbn6yiLDd0TbgHM3wkZeY$ 

LNCS Format Guidelines:
https://urldefense.us/v3/__https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines__;!!G_uCfscf7eWS!caIA3Bd5FbDH5mXtpMUm4St2GGaQv_XGkJdT2yvERjslnq5m1TXOpVIUnGo6vx4pjzssPJRbn6yiLDd0TbgHtDxCl4Q$ 

Follow VHPC Updates: https://urldefense.us/v3/__https://x.com/VHPCworkshop__;!!G_uCfscf7eWS!caIA3Bd5FbDH5mXtpMUm4St2GGaQv_XGkJdT2yvERjslnq5m1TXOpVIUnGo6vx4pjzssPJRbn6yiLDd0TbgHhM2BsO0$  and
https://urldefense.us/v3/__https://bsky.app/profile/vhpc.bsky.social__;!!G_uCfscf7eWS!caIA3Bd5FbDH5mXtpMUm4St2GGaQv_XGkJdT2yvERjslnq5m1TXOpVIUnGo6vx4pjzssPJRbn6yiLDd0TbgHs_HeNe8$