[hpc-announce] Call for participation: The 2nd International Workshop on Data Reduction for Big Scientific Data (DRBSD-2) in Conjunction with SC’17

Tue Oct 10 16:09:08 CDT 2017

The 2nd International Workshop on Data Reduction for Big Scientific Data
(DRBSD-2)
in Conjunction with SC’17
Nov 17th, 2017
Denver, CO

https://web.njit.edu/~qliu/drbsd2.html

Link to the SC technical program
<http://sc17.supercomputing.org/presentation/?id=wksp111&sess=sess123>

*Keynote Talk #1*
SKA: The Data Domino Enabled by DALiuGE
Andreas Wicenec, the University of Western Australia

Abstract:
The Square Kilometre Array (SKA) will pose interesting new challenges on
the way scientific computing is carried out. The processing will require to
connect the antenna arrays in South Africa and Australia to dedicated 200PF
scale HPC centres over some 700km WAN connections. Some part of the on-line
calibration and transient detection will be carried out on a (sub) second
cadence on data streams of about 1TB/s. The further processing will first
collect all the data from a single 6-12 hour long observation and then
perform an iterative image reconstruction and ‘cleaning’ on that data set.
With current algorithms the bottleneck seems to be in memory bandwidth, but
in addition the level of data parallelism and inherent concurrency reaches
quite extreme levels with several tens of millions of tasks and data items
to be scheduled and managed during a single image reconstruction run. The
design of the SKA processing system thus includes an execution framework
detailing the baseline concepts of an architecture enabling the processing
at SKA scale. Along with working on the architecture and detailed design of
this execution framework, we have also implemented a prototype to prove the
viability of the proposed design decisions and extract the actual
requirements for the ‘final’, operational execution framework system. This
agile process quite naturally exposed quite a number of existing potential
candidate frameworks, technologies and concepts, which are well established
in the Big Data and HPC communities. We have carefully analysed these
candidate technologies, but deliberately stayed independent of any of the
complete frameworks in order to arrive with a ‘vendor’ neutral design and
set of requirements. The result of the prototyping work is called DALiuGE,
which stands for 'Data Activated Flow Graph Engine’. DALiuGE implements
most of the concepts required to perform the various radio astronomy
workflows, while almost completely avoiding any unnecessary features. While
fully driven by radio astronomy, DALiuGE is still completely generic and
can be adopted to any kind of similar workflow problems. This talk will
highlight the key concepts and solutions of DALiuGE and also present the
results of test runs at scale.

Short Bio:
Andreas Wicenec is Professor at the University of Western Australia since
2010, leading the Data Intensive Astronomy Program of the International
Centre for Radio Astronomy Research designing and implementing data flows
and high performance scientific computing for large scale astronomical
facilities and surveys. During his career he had the privilege to be
involved in the software development, data management and reduction and
operation of several large scale astronomical facilities, including the ESA
cornerstone HIPPARCOS satellite, the Very Large Telescope (VLT) and the
Atacama Large Millimetre and Submillimetre Array (ALMA) in Chile, the
Murchison Widefield Array (MWA), the Fivehundred metre Aperture Spherical
Telescope (FAST) and the Square Kilometre Array (SKA). Prof. Wicenec is
also involved in the International Virtual Observatory Alliance (IVOA). His
scientific interests in astronomy include precision global astrometry,
optical background radiation, stellar photometry, dynamics and evolution of
planetary nebulae and observational survey astronomy. In computer science
he is doing research in workflow construction and execution as well as
scheduling and the related computational concepts.

*Keynote Talk #2*
Facing the Big Data Challenge in the Fusion Code XGC
CS Chang, Princeton Plasma Physics Laboratory

Abstract:
Boundary plasma of a magnetic fusion reactor is far from a thermodynamic
equilibrium, with the physics dominated by nonlinear multiscale
multiphysics interactions in a complicated geometry, and requires
extreme-scale computing for first-principles based understanding.  The
modern scalable particle-in-cell code XGC has been developed for this
purpose, in partnership with the computer science and applied mathematics
communities over the last decade. The bigger the computer is, the more
complete physics can be contained in XGC.   XGC’s extreme scale capability
has been recognized by being award a few hundred million hours of computing
time from all US leadership class computers, and by being selected into all
three pre-exascale or exascale programs: CAAR at OLCF, NESAP at NERSC, and
AURORA ESP at ALCF.  The physics data size produced from a 1-day XGC run of
ITER plasma on the present ~20PF computer is ~100PB, which is much above
the limit imposed by the present technology.  We are losing most of the
valuable physics data in order to keep the data flow within the limits
imposed by the I/O rate and the file system size.  Since the problem size
will increase in proportion to the parallel computer capability, the
challenge will grow at least 100-fold as the exascale computers arrive.
Reduction of the data size by several orders of magnitude is required that
can still preserve the accuracy to enable various levels of scientific
discoveries.  On-the-fly in-memory data analysis and visualization must
occur at the same time.  These issues, as well as the necessity to
collaborate tightly with the applied mathematics and computer science
communities, will be discussed from the application driver point of view.

Short Bio:
C.S. Chang has extensive experience in successfully leading large-scale,
multi-institutional, multi-disciplinary teams composed of fusion energy
scientists, computer scientists, and applied mathematicians; which include
the Proto-Type Fusion Simulation Project for Plasma Edge Simulation,
SciDAC-2 Center for Plasma Edge Simulation (CPES), SciDAC-3 Center for Edge
Plasma Simulation (EPSI), and the new SciDAC-4 Partnership Center for
High-fidelity Boundary Plasma Simulation (XBP).  C.S. Chang is a Fellow of
the American Physical Society, and has been serving in many national and
international leadership roles, which includes chairing the recent DOE
ASCR/FES Exascale Requirement Review activities. He has given numerous
invited and plenary talks, keynote speeches, and tutorial lectures at major
international conferences, and has supervised more than 20 Ph.D.
dissertations.

*Tentative Workshop Agenda*
8:30 Welcome and opening remark

8:30 - 9:10 Keynote Talk
The Data Domino Enabled by DALiuGE, Andreas Wicenec,  The University of
Western Australia

9:10 - 10:10 Papers (20 mins each)
Sheng Di, Dingwen Tao and Franck Cappello. An Efficient Approach to Lossy
Compression with Pointwise Relative Error Bound
Benjamin Welton and Barton Miller. Data Reduction and Partitioning in an
Extreme Scale GPU-Based Clustering Algorithm
Mark Ainsworth, Ozan Tugluk and Ben Whitney. MGARD: A Multilevel Technique
for Compression of Floating-Point Data

10:10 - 10:20 Break

10:20 - 11:00 Keynote talk
Facing the Big Data Challenge in the Fusion Code XGC, CS Chang, Princeton
Plasma Physics Lab

11:00 - 12:00 Papers (20 mins each)
Swati Singhal and Alan Sussman. Adaptive Compression to Improve I/O
Performance for Climate Simulations
Guénolé Harel, Jacques-Bernard Lekien and Philippe Pébaÿ. Lean
Visualization of Large Scale Tree-Based AMR Meshe
Kenny Gruchalla, Nicholas Brunhart-Lupo, Kristin Potter and John Clyne.
Contextual Compression of Large-Scale Wind Turbine Array Simulations

*Organizing Committee*
Scott Klasky, Oak Ridge National Laboratory
Gary Liu, New Jersey Institute of Technology
Mark Ainsworth, Brown University/Oak Ridge National Laboratory
Ian Foster, Argonne National Laboratory/University of Chicago

*Technical Program Committee*
Frank Cappello, Argonne National Laboratory
Peter Lindstrom, Lawrence Livermore National Laboratory
Todd Munson, Argonne National Laboratory
Kerstin Van Dam, Brookhaven National Laboratory
George Ostrouchov, Oak Ridge National Laboratory
Scott Klasky, Oak Ridge National Laboratory
Mark Ainsworth, Brown University/Oak Ridge National Laboratory
John Wu, Lawrence Berkeley National Laboratory
Todd Munson, Argonne National Laboratory
Eric Suchyta, Oak Ridge National Laboratory
Martin Burtscher, Texas State University
-- 
973-596-3526
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/hpc-announce/attachments/20171010/0b18bb9f/attachment.html>