[mpich-discuss] [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a2, MVAPICH2-X 1.9a2 and OSU Micro-Benchmarks (OMB) 3.8 (fwd)
Dhabaleswar Panda
panda at cse.ohio-state.edu
Thu Nov 8 23:32:04 CST 2012
These releases might be of interest to some of the MPICH users. Thus, I am
posting it here.
Thanks,
DK
---------- Forwarded message ----------
Date: Thu, 8 Nov 2012 22:20:50 -0500 (EST)
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
To: mvapich-discuss at cse.ohio-state.edu
Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a2,
MVAPICH2-X 1.9a2 and OSU Micro-Benchmarks (OMB) 3.8
The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a2,
MVAPICH2-X 1.9a2 (Hybrid MPI+PGAS (UPC and OpenSHMEM) with Unified
Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.8.
====================================================================
Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a2 (since
MVAPICH2 1.8.1 release) are listed here.
* New Features and Enhancements (since 1.8.1). (**) indicates enhancement
since 1.9a:
- (**) Based on MPICH2-1.5
- (**) Initial support for MPI-3:
(Available for all interfaces: OFA-IB-CH3, OFA-iWARP-CH3,
OFA-RoCE-CH3, uDAPL-CH3, OFA-IB-Nemesis and PSM-CH3)
- Non-blocking collective functions available as "MPIX_" functions
(e.g., "MPIX_Ibcast")
- Neighborhood collective routines available as "MPIX_" functions
(e.g., "MPIX_Neighbor_allgather")
- MPI_Comm_split_type function available as an "MPIX_" function
- Support for MPIX_Type_create_hindexed_block
- Non-blocking communicator duplication routine MPIX_Comm_idup
(will only work for single-threaded programs)
- MPIX_Comm_create_group support
- Support for matched probe functionality (e.g., MPIX_Mprobe,
MPIX_Improbe, MPIX_Mrecv, and MPIX_Imrecv),
(Not Available for PSM)
- Support for "Const" (disabled by default)
- (**) Efficient vector, hindexed datatype processing on GPU buffers
- (**) Tuned alltoall, Scatter and Allreduce collectives
- (**) Support for Mellanox Connect-IB HCA
- (**) Adaptive number of registration cache entries based on job size
- (**) Revamped Build system:
- Uses automake instead of simplemake
- Renamed "maint/updatefiles" to "autogen.sh"
- Allows for parallel builds ("make -j8" and similar)
- Support for InfiniBand hardware UD-multicast
- Scalable UD-multicast-based designs for collectives
(Bcast, Allreduce and Scatter)
- Sample Bcast numbers:
http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml
- Enhanced Bcast and Reduce collectives with pt-to-pt communication
- LiMIC-based design for Gather collective
- Improved performance for shared-memory-aware collectives
- Improved intra-node communication performance with GPU buffers
using pipelined design
- Improved inter-node communication performance with GPU buffers
with non-blocking CUDA copies
- Improved small message communication performance with
GPU buffers using CUDA IPC design
- Improved automatic GPU device selection and CUDA context management
- Optimal communication channel selection for different
GPU communication modes (DD, DH and HD) in different
configurations (intra-IOH and inter-IOH)
- Removed libibumad dependency for building the library
- Tuned thresholds for various architectures
- Set DAPL-2.0 as the default version for the uDAPL interface
- Updated to hwloc v1.5
- Option to use IP address as a fallback if hostname
cannot be resolved
* Bug-Fixes (since 1.8.1). (**) indicates fix since 1.9a:
- (**) CPU frequency mismatch warning shown under debug
- (**) Fix issue with MPI_IN_PLACE buffers with CUDA
- (**) Fix ptmalloc initialization issue due to compiler optimization
- Thanks to Kyle Sheumaker from ACT for the report
- (**) Adjustable MAX_NUM_PORTS at build time to support
more than two ports
- (**) Fix issue with MPI_Allreduce with MPI_IN_PLACE send buffer
- (**) Fix memleak in MPI_Cancel with PSM interface
- Thanks to Andrew Friedley from LLNL for the report
====================================================================
MVAPICH2-X 1.9a2 software package (released as a technology preview)
provides support for hybrid MPI+PGAS (UPC and OpenSHMEM) programming
models with unified communication runtime for emerging exascale
systems. This software package provides flexibility for users to
write applications using the following programming models with a
unified communication runtime: MPI, MPI+OpenMP, pure UPC, and pure
OpenSHMEM programs as well as hybrid MPI(+OpenMP) + PGAS (UPC and
OpenSHMEM) programs.
Features for MVAPICH2-X 1.9a2 are as follows:
* MPI Features. (**) indicates feature since 1.9a:
- (**) MPI-2.2 standard compliance and initial support for MPI-3
- (**) Based on MVAPICH2 1.9a2 (OFA-IB-CH3 interface). MPI programs
can take advantage of all the features enabled by default
in OFA-IB-CH3 interface of MVAPICH2 1.9a2
- High performance two-sided communication scalable to
multi-thousand nodes
- Optimized collective communication operations
- Shared-memory optimized algorithms for barrier, broadcast,
reduce and allreduce operations
- Optimized two-level designs for scatter and gather operations
- Improved implementation of allgather, alltoall operations
- High-performance and scalable support for one-sided communication
- Direct RDMA based designs for one-sided communication
- Shared memory backed Windows for One-Sided communication
- Support for truly passive locking for intra-node RMA
in shared memory backed windows
- Multi-threading support
- Enhanced support for multi-threaded MPI applications
* (**) Unified Parallel C (UPC) Features
- UPC Language Specification v1.2 standard compliance
- Based on Berkeley UPC v2.14.2
- Optimized RDMA-based implementation of UPC data movement routines
- Improved UPC memput design for small/medium size messages
* OpenSHMEM Features:
- OpenSHMEM v1.0 standard compliance
- Based on OpenSHMEM reference implementation v1.0c
- Optimized RDMA-based implementation of OpenSHMEM
data movement routines
- Efficient implementation of OpenSHMEM atomics using RDMA atomics
- High performance intra-node communication using
shared memory based schemes
- (**) Optimized OpenSHMEM put routines for small/medium message sizes
* Hybrid Program Features:
- (**) Supports hybrid programming using MPI(+OpenMP),
MPI(+OpenMP)+UPC and MPI(+OpenMP)+OpenSHMEM
- (**) Compliance to MPI-2.2 and initial support for MPI-3 features,
UPC v1.2 and OpenSHMEM v1.0 standards
- Optimized network resource utilization through the
unified communication runtime
- (**) Efficient deadlock-free progress of MPI and UPC/OpenSHMEM calls
* Unified Runtime Features:
- (**) Based on MVAPICH2 1.9a2 (OFA-IB-CH3 interface). All the
runtime features enabled by default in OFA-IB-CH3 interface of
MVAPICH2 1.9a2 are available in MVAPICH2-X 1.9a2. MPI, UPC,
OpenSHMEM and Hybrid programs benefit from its features
listed below:
- Scalable inter-node communication with highest performance
and reduced memory usage
- Integrated RC/XRC design to get best performance on
large-scale systems with reduced/constant memory footprint
- RDMA Fast Path connections for efficient small
message communication
- Shared Receive Queue (SRQ) with flow control to significantly
reduce memory footprint of the library
- AVL tree-based resource-aware registration cache
- Automatic tuning based on network adapter and host architecture
- Optimized intra-node communication support by taking
advantage of shared-memory communication
- Efficient Buffer Organization for Memory Scalability of
Intra-node Communication
- Automatic intra-node communication parameter tuning
based on platform
- Flexible CPU binding capabilities
- Portable Hardware Locality (hwloc v1.5) support for
defining CPU affinity
- Efficient CPU binding policies (bunch and scatter patterns,
socket and numanode granularities) to specify CPU binding
per job for modern multi-core platforms
- Allow user-defined flexible processor affinity
- Two modes of communication progress
- Polling
- Blocking (enables running multiple processes/processor)
- Flexible process manager support
- Support for mpirun rsh, hydra and oshrun process managers
- (**) Support for upcrun process manager
* Bug Fixes (since 1.9a):
- Fixed incorrect compiler selection in oshfort
- Fixed linker errors with Intel oshfort compiler
MVAPICH2-X delivers excellent performance. Sample UPC and OpenSHMEM
performance numbers can be obtained from the following URL:
http://mvapich.cse.ohio-state.edu/performance/mvapich2x/
====================================================================
New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.8 (since
OMB 3.7 release) are listed here.
* Features:
- New UPC benchmarks
- osu_upc_memput
- osu_upc_memget
====================================================================
For downloading MVAPICH2 1.9a2, MVAPICH2-X 1.9a2, OMB 3.8, associated
user guides, quick start guide, and accessing the SVN, please visit
the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
We are also happy to inform that the number of organizations using
MVAPICH2 and MVAPICH2-X (and registered at the MVAPICH site) has
crossed 2,000 world-wide (in 70 countries). The MVAPICH team extends
thanks to all these organizations.
Thanks,
The MVAPICH Team
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mpich-discuss
mailing list