[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
Dhabaleswar Panda
panda at cse.ohio-state.edu
Sun Sep 9 00:10:46 CDT 2012
These releases might be of interest to some of the MPICH users. Thus, I am
posting it here.
Thanks,
DK
---------- Forwarded message ----------
Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
To: mvapich-discuss at cse.ohio-state.edu
Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,
MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.
Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
MVAPICH2 1.8GA release) are listed here.
* New Features and Enhancements (since 1.8GA):
- Support for InfiniBand hardware UD-multicast
- Scalable UD-multicast-based designs for collectives
(Bcast, Allreduce and Scatter)
- Sample Bcast numbers:
http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml
- Enhanced Bcast and Reduce collectives with pt-to-pt communication
- LiMIC-based design for Gather collective
- Improved performance for shared-memory-aware collectives
- Improved intra-node communication performance with GPU buffers
using pipelined design
- Improved inter-node communication performance with GPU buffers
with non-blocking CUDA copies
- Improved small message communication performance with
GPU buffers using CUDA IPC design
- Improved automatic GPU device selection and CUDA context management
- Optimal communication channel selection for different
GPU communication modes (DD, DH and HD) in different
configurations (intra-IOH and inter-IOH)
- Removed libibumad dependency for building the library
- Option for selecting non-default gid-index in a loss-less
fabric setup in RoCE mode
- Option to disable signal handler setup
- Tuned thresholds for various architectures
- Set DAPL-2.0 as the default version for the uDAPL interface
- Updated to hwloc v1.5
- Option to use IP address as a fallback if hostname
cannot be resolved
- Improved error reporting
* Bug-Fixes (since 1.8GA):
- Fix issue in intra-node knomial bcast
- Handle gethostbyname return values gracefully
- Fix corner case issue in two-level gather code path
- Fix bug in CUDA events/streams pool management
- Fix ptmalloc initialization issue when MALLOC_CHECK_ is
defined in the environment
- Thanks to Mehmet Belgin from Georgia Institute of
Technology for the report
- Fix memory corruption and handle heterogeneous architectures
in gather collective
- Fix issue in detecting the correct HCA type
- Fix issue in ring start-up to select correct HCA when
MV2_IBA_HCA is specified
- Fix SEGFAULT in MPI_Finalize when IB loop-back is used
- Fix memory corruption on nodes with 64-cores
- Thanks to M Xie for the report
- Fix hang in MPI_Finalize with Nemesis interface when
ptmalloc initialization fails
- Thanks to Carson Holt from OICR for the report
- Fix memory corruption in shared memory communication
- Thanks to Craig Tierney from NOAA for the report
and testing the patch
- Fix issue in IB ring start-up selection with mpiexec.hydra
- Fix issue in selecting CUDA run-time variables when running
on single node in SMP only mode
- Fix few memory leaks and warnings
MVAPICH2-X 1.9a software package (released as a technology preview)
provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
with unified communication runtime for emerging exascale systems.
This software package provides flexibility for users to write
applications using the following programming models with a unified
communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.
Features for MVAPICH2-X 1.9a are as follows:
* MPI Features:
- MPI-2.2 standard compliance
- Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can
take advantage of all the features enabled by default
in OFA-IB-CH3 interface of MVAPICH2 1.9a
- High performance two-sided communication scalable to
multi-thousand nodes
- Optimized collective communication operations
- Shared-memory optimized algorithms for barrier, broadcast,
reduce and allreduce operations
- Optimized two-level designs for scatter and gather operations
- Improved implementation of allgather, alltoall operations
- High-performance and scalable support for one-sided communication
- Direct RDMA based designs for one-sided communication
- Shared memory backed Windows for One-Sided communication
- Support for truly passive locking for intra-node RMA
in shared memory backed windows
- Multi-threading support
- Enhanced support for multi-threaded MPI applications
* OpenSHMEM Features:
- OpenSHMEM v1.0 standard compliance
- Based on OpenSHMEM reference implementation v1.0c
- Optimized RDMA-based implementation of OpenSHMEM
data movement routines
- Efficient implementation of OpenSHMEM atomics using RDMA atomics
- High performance intra-node communication using
shared memory based schemes
* Hybrid Program Features:
- Supports hybrid programming using MPI and OpenSHMEM
- Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
- Optimized network resource utilization through the
unified communication runtime
- Efficient deadlock-free progress of MPI and OpenSHMEM calls
* Unified Runtime Features:
- Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
and Hybrid programs benefit from its features listed below:
- Scalable inter-node communication with highest performance
and reduced memory usage
- Integrated RC/XRC design to get best performance on
large-scale systems with reduced/constant memory footprint
- RDMA Fast Path connections for efficient small
message communication
- Shared Receive Queue (SRQ) with flow control to significantly
reduce memory footprint of the library
- AVL tree-based resource-aware registration cache
- Automatic tuning based on network adapter and host architecture
- Optimized intra-node communication support by taking
advantage of shared-memory communication
- Efficient Buffer Organization for Memory Scalability of
Intra-node Communication
- Automatic intra-node communication parameter tuning
based on platform
- Flexible CPU binding capabilities
- Portable Hardware Locality (hwloc v1.5) support for
defining CPU affinity
- Efficient CPU binding policies (bunch and scatter patterns,
socket and numanode granularities) to specify CPU binding
per job for modern multi-core platforms
- Allow user-defined flexible processor affinity
- Two modes of communication progress
- Polling
- Blocking (enables running multiple processes/processor)
- Flexible process manager support
- Support for mpirun rsh, hydra and oshrun process managers
MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM
Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
platform. More performance numbers can be obtained from the following
URL:
http://mvapich.cse.ohio-state.edu/performance/mvapich2x/
New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since
OMB 3.6 release) are listed here.
* Features:
- New OpenSHMEM benchmarks
- osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
osu_oshm_atomics
* Bug fixes:
- Fix issue with IN_PLACE in osu_gather, osu_scatter and
osu_allgather benchmarks
- Destroy the CUDA context at the end in CUDA supported benchmarks
For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
user guides, quick start guide, and accessing the SVN, please visit
the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
Thanks,
The MVAPICH Team
More information about the mpich-discuss
mailing list