[hpc-announce] Call for Participation: MCHPC'17: Workshop on Memory Centric Programming for HPC held in conjunction with SC17

Yonghong Yan yanyh15 at gmail.com
Tue Nov 7 12:03:24 CST 2017

*Call for Participation: *
*MCHPC'17: Workshop on Memory Centric Programming for HPC*
*Location: 702, Colorado Convention Center, Denver, CO USA*

*Time/Date: 9:00AM - 12:30PM, November 12, 2017*

*held in conjunction with SC17: The International Conference on High
Performance Computing, *
*Networking, Storage and Analysis and in cooperation with ACM SIGHPC*

Program09:00 - 09:10 Opening Remarks, Yonghong Yan, University of South
Carolina09:10 - 10:00 Session 1: Keynote Talk: Compiler and Runtime
Challenges for Memory Centric Programming, Vivek Sarkar (Georgia Tech)
<#keynote>     Session Chair: Ron Brightwell, Sandia National Laboratories10:00
- 10:30 Break10:30 - 11:10 Session 2: Invited Talk: Persistent Memory: The
Value to HPC and the Challenges, Andy Rudoff (Intel) <#invited>     Session
Chair: Yonghong Yan, University of South Carolina11:10 - 12:28 Session 3:
Paper Presentations, Session Chair: TBD     11:10 - 11:35 Bit Contiguous
Memory Allocation for Processing In Memory, John Leidel <#bitLeidel>
11:35 - 12:00 Beyond 16GB: Out-of-Core Stencil Computations, Istvan Reguly,
Gihan Mudalige and Mike Giles <#16gReguly>     12:00 - 12:15 NUMA Distance
for Heterogeneous Memory, Sean Williams, Latchesar Ionkov and Michael Lang
<#numadWilliams>     12:15 - 12:28 GPGPU Memory Performance Measurement
with the C-AMAT Model, Ning Zhang, Chuntao Jiang, and Xian-He Sun
<#gpucamatZhang>12:28 - 12:30 Closing
Keynote Talk: Compiler and Runtime Challenges for Memory Centric
Programming, Vivek Sarkar (Georgia Institute of Technology)Abstract:

It is widely recognized that a major disruption is under way in computer
hardware as processors strive to extend, and go beyond, the end-game of
Moore's Law. This disruption will include new forms of processor and memory
hierarchies, including near-memory computation structures. In this talk, we
summarize compiler and runtime challenges for memory centric programming,
based on past experiences with the X10 project at IBM and the Habanero
project at Rice University and Georgia Tech. A key insight in addressing
compiler challenges is to expand the state-of-the-art in analyzing and
transforming explicitly-parallel programs, so as to encourage programmers
to write forward-scalable layout-independent code rather than hardwiring
their programs to specific hardware platforms and specific data layouts. A
key insight in addressing runtime challenges is to focus on asynchrony in
both computation and data movement, while supporting both in a unified and
integrated manner. A cross-cutting opportunity across compilers and
runtimes is to broaden the class of computation and data mappings that can
be considered for future systems. Based on these and other insights, we
will discuss recent trends in compilers and runtime systems that point the
way towards possible directions for addressing the challenges of memory
centric programming.
Speaker: Vivek Sarkar (Georgia Institute of Technology),

Vivek Sarkar is a Professor in the School of Computer Science, and the
Stephen Fleming Chair for Telecommunications in the College of Computing at
at Georgia Institute of Technology, since August 2017. Prior to joining
Georgia Tech, Sarkar was a Professor of Computer Science at Rice
University, and the E.D. Butcher Chair in Engineering. During 2007 - 2017,
Sarkar built Rice's Habanero Extreme Scale Software Research Group with the
goal of unifying parallelism and concurrency elements of high-end
computing, multicore, and embedded software stacks (http://habanero.rice.edu).
He also served as Chair of the Department of Computer Science at Rice
during 2013 - 2016.

Prior to joining Rice in 2007, Sarkar was Senior Manager of Programming
Technologies at IBM Research. His research projects at IBM included the X10
programming language, the Jikes Research Virtual Machine for the Java
language, the ASTI optimizer used in IBM’s XL Fortran product compilers,
and the PTRAN automatic parallelization system. Sarkar became a member of
the IBM Academy of Technology in 1995, and was inducted as an ACM Fellow in
2008. He has been serving as a member of the US Department of Energy’s
Advanced Scientific Computing Advisory Committee (ASCAC) since 2009, and on
CRA’s Board of Directors since 2015.
Invited Talk: Persistent Memory: The Value to HPC and the Challenges, Andy
Rudoff (Intel)Abstract:

In this talk, Andy will describe the emerging Persistent Memory technology
and how it can be applied to HPC-related use cases.  Andy will also discuss
some of the challenges using Persistent Memory, and the ongoing work the
ecosystem is doing to mitigate those challenges.
Speaker: Andy Rudoff (Intel)

Andy Rudoff is a Senior Principal Engineer at Intel Corporation, focusing
on Non-Volatile Memory programming. He is a contributor to the SNIA NVM
Programming Technical Work Group. His more than 30 years industry
experience includes design and development work in operating systems, file
systems, networking, and fault management at companies large and small,
including Sun Microsystems and VMware. Andy has taught various Operating
Systems classes over the years and is a co-author of the popular UNIX
Network Programming text book.
Paper Presentations:1.  Bit Contiguous Memory Allocation for Processing In
Memory,John Leidel, Tactical Computing LaboratoriesAbstract

Given the recent resurgence of research into processing in or near memory
systems, we find an ever increasing need to augment traditional system
software tools in order to make efficient use of the PIM hardware
abstractions. One such architecture, the Micron In-Memory Intelligence
(IMI) DRAM, provides a unique processing capability within the sense amp
stride of a traditional 2D DRAM architecture. This accumulator processing
circuit has the ability to compute both horizontally and vertically on
pitch within the array. This unique processing capability requires a memory
allocator that provides physical bit locality in order to ensure numerical

In this work we introduce a new memory allocation methodology that provides
bit contiguous allocation mechanisms for horizontal and vertical memory
allocations for the Micron IMI DRAM device architecture. Our methodology
drastically reduces the complexity by which to find new, unallocated memory
blocks by combining a sparse matrix representation of the array with dense
continuity vectors that represent the relative probability of finding
candidate free blocks. We demonstrate our methodology using a set of
pathological and standard benchmark applications in both horizontal and
vertical memory modes.
2. Beyond 16GB: Out-of-Core Stencil Computations,Istvan Z. Reguly, Pazmany
Peter Catholic University; Gihan Mudalige, University of Warwick; Mike
Giles, University of OxfordAbstract:

Stencil computations are a key class of applications, widely used in the
scientific computing community, and a class that has particularly benefited
from performance improvements on architectures with high memory bandwidth.
Unfortunately, such architectures come with a limited amount of fast
memory, which is limiting the size of the problems that can be efficiently
solved. In this paper, we address this challenge by applying the well-known
cache-blocking tiling technique to large scale stencil codes implemented
using the OPS domain specific language, such as CloverLeaf 2D, CloverLeaf
3D, and OpenSBLI. We introduce a number of techniques and optimisations to
help manage data resident in fast memory, and minimise data movement.
Evaluating our work on Intel's Knights Landing Platform as well as NVIDIA
P100 GPUs, we demonstrate that it is possible to solve 3 times larger
problems than the on-chip memory size with at most 15% loss in efficiency.
3. NUMA Distance for Heterogeneous MemorySean Williams, New Mexico
Consortium; Latchesar Ionkov, Los Alamos National Laboratory; Michael Lang,
Los Alamos National LaboratoryAbstract

Experience with Intel Xeon Phi suggests that NUMA alone is inadequate for
assignment of pages to devices in heterogeneous memory systems. We argue
that this is because NUMA is based on a single distance metric between all
domains (i.e., number of devices “in between” the domains), while
relationships between heterogeneous domains can and should be characterized
by multiple metrics (e.g., latency, bandwidth, capacity). We therefore
propose elaborating the concept of NUMA distance to give better and more
intuitive control of placement of pages, while retaining most of the
simplicity of the NUMA abstraction. This can be based on minor modification
of the Linux kernel, with the possibility for further development by
hardware vendors.
4. GPGPU Memory Performance Measurement with the C-AMAT ModelNing Zhang,
Illinois Institute of Technology, USA; Chuntao Jiang, Fushan University,
China; Xian-He Sun, Illinois Institute of Technology. USAAbstract

General Purpose Graphics Processing Units (GPGPU) have become a popular
platform to accelerate computing. However, while they provide additional
computing powers, GPGPU have put even more pressure on the already
behindhand memory systems. Memory performance is an identified performance
killer of GPGPU. Evaluating, understanding, and improving GPGPU data access
delay is an imperative research issue of high-performance computing. In
this study, we utilize the newly proposed C-AMAT (Concurrent Average Memory
Access Time) model to measure the memory performance of GPGPU. We first
introduce a GPGPU-specialized measurement design of C-AMAT. Then the modern
GPGPU simulator, GPGPU-Sim, is used to carry the performance study.
Finally, the performance results are analyzed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/hpc-announce/attachments/20171107/a440f957/attachment-0001.html>

More information about the hpc-announce mailing list