[hpc-announce] Call for Participation: National Science Data Fabric (NSDF) Webinar: Distinguished Speaker Ryan Abernathey (Columbia University) - April 28 2022

Michela Taufer taufer at udel.edu
Sat Apr 23 13:48:59 CDT 2022


[Apologies if you got multiple copies of this email.]

Please share with colleagues, students, and collaborators.

###############################################################
Call for Participation

National Science Data Fabric (NSDF) Webinar: Distinguished Speaker Ryan Abernathey (Columbia University) 
###############################################################

Title: Pangeo Forge - Crowdsourcing Analysis Ready Data in the Cloud
Speaker: Ryan Abernathey, Columbia University, Department of Earth and Environmental Science

Date: April 28 2022 at 12:30 pm ET 
Join Zoom Meeting: https://utah.zoom.us/j/99659937052  

More about the webinar and NSDF: http://nationalsciencedatafabric.org/

Abstract: Analysis-ready, cloud optimized (ARCO) scientific data is essential for scalable big data analytics in the cloud. ARCO can massively accelerate statistical analysis, visualization, and machine learning workflows on large-scale scientific datasets. However, most scientific data is distributed in archival formats that are not optimized for large-scale analysis.
Pangeo Forge (https://pangeo-forge.org/) is an open source framework for data Extraction, Transformation, and Loading (ETL) of scientific data. The goal of Pangeo Forge is to make it easy to extract data from traditional data archives and deposit it in cloud object storage in ARCO format.
Pangeo Forge is made of two main components:

(i) Pangeo Forge Recipes: an open source Python package, which allows you to create and run ETL pipelines (“recipes”) and run them on your own computer.

(ii) Pangeo Forge Cloud: a cloud-based automation framework which executes these recipes in the cloud from code stored in GitHub and deposits the data into cloud object storage.

By storing data recipes in version-controlled GitHub repositories, we can maintain perfect provenance information from archival repository to ARCO copy. Using Pangeo Forge, we are collaboratively populating a petabyte-scale library of open ARCO climate data distributed across multiple cloud storage services, including Open Storage Network. Pangeo Forge is inspired directly by Conda Forge, a community-led collection of recipes for building conda packages. We hope that Pangeo Forge can eventually play the same role for datasets, encouraging open, interdisciplinary collaboration around data curation.

Bio: Ryan is a computational physical oceanographer who leads the Ocean Transport Group, whose mission is to advance scientific understanding of how stuff moves around the ocean and how this transport influences Earth’s large-scale climate and ecosystems. This research involves working with satellite data, numerical simulations, and observational datasets. Ryan is an enthusiastic advocate for open source scientific software and is an active contributor the Pangeo Project, a community platform for Big Data geoscience.


_________________________________________________


Michela Taufer
Jack Dongarra Professor in High Performance Computing
The University of Tennessee 
Electrical Engineering and Computer Science Dept.
401 Min H. Kao Bldg 
1520 Middle Drive
Knoxville, TN  37996-2250

Phone: (302) 690 7845
E-Mail: taufer at acm.org
URL: https://globalcomputing.group/

Follow me on Twitter at: https://twitter.com/MichelaTaufer 
Follow my Group on Twitter at: https://twitter.com/TauferLab
__________________________________________________





More information about the hpc-announce mailing list