[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7

Sun Sep 9 00:10:46 CDT 2012

These releases might be of interest to some of the MPICH users. Thus, I am 
posting it here.

Thanks,

DK

---------- Forwarded message ----------
Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
To: mvapich-discuss at cse.ohio-state.edu
Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
     MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7

The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,
MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.

Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
MVAPICH2 1.8GA release) are listed here.

* New Features and Enhancements (since 1.8GA):
     - Support for InfiniBand hardware UD-multicast
     - Scalable UD-multicast-based designs for collectives
       (Bcast, Allreduce and Scatter)
        - Sample Bcast numbers:
http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml
     - Enhanced Bcast and Reduce collectives with pt-to-pt communication
     - LiMIC-based design for Gather collective
     - Improved performance for shared-memory-aware collectives
     - Improved intra-node communication performance with GPU buffers
       using pipelined design
     - Improved inter-node communication performance with GPU buffers
       with non-blocking CUDA copies
     - Improved small message communication performance with
       GPU buffers using CUDA IPC design
     - Improved automatic GPU device selection and CUDA context management
     - Optimal communication channel selection for different
       GPU communication modes (DD, DH and HD) in different
       configurations (intra-IOH and inter-IOH)
     - Removed libibumad dependency for building the library
     - Option for selecting non-default gid-index in a loss-less
       fabric setup in RoCE mode
     - Option to disable signal handler setup
     - Tuned thresholds for various architectures
     - Set DAPL-2.0 as the default version for the uDAPL interface
     - Updated to hwloc v1.5
     - Option to use IP address as a fallback if hostname
       cannot be resolved
     - Improved error reporting

* Bug-Fixes (since 1.8GA):
     - Fix issue in intra-node knomial bcast
     - Handle gethostbyname return values gracefully
     - Fix corner case issue in two-level gather code path
     - Fix bug in CUDA events/streams pool management
     - Fix ptmalloc initialization issue when MALLOC_CHECK_ is
       defined in the environment
         - Thanks to Mehmet Belgin from Georgia Institute of
           Technology for the report
     - Fix memory corruption and handle heterogeneous architectures
       in gather collective
     - Fix issue in detecting the correct HCA type
     - Fix issue in ring start-up to select correct HCA when
       MV2_IBA_HCA is specified
     - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
     - Fix memory corruption on nodes with 64-cores
         - Thanks to M Xie for the report
     - Fix hang in MPI_Finalize with Nemesis interface when
       ptmalloc initialization fails
         - Thanks to Carson Holt from OICR for the report
     - Fix memory corruption in shared memory communication
         - Thanks to Craig Tierney from NOAA for the report
           and testing the patch
     - Fix issue in IB ring start-up selection with mpiexec.hydra
     - Fix issue in selecting CUDA run-time variables when running
       on single node in SMP only mode
     - Fix few memory leaks and warnings

MVAPICH2-X 1.9a software package (released as a technology preview)
provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
with unified communication runtime for emerging exascale systems.
This software package provides flexibility for users to write
applications using the following programming models with a unified
communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.

Features for MVAPICH2-X 1.9a are as follows:

* MPI Features:
     - MPI-2.2 standard compliance
     - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can
       take advantage of all the features enabled by default
       in OFA-IB-CH3 interface of MVAPICH2 1.9a
     - High performance two-sided communication scalable to
       multi-thousand nodes
     - Optimized collective communication operations
     - Shared-memory optimized algorithms for barrier, broadcast,
       reduce and allreduce operations
     - Optimized two-level designs for scatter and gather operations
     - Improved implementation of allgather, alltoall operations
     - High-performance and scalable support for one-sided communication
     - Direct RDMA based designs for one-sided communication
     - Shared memory backed Windows for One-Sided communication
     - Support for truly passive locking for intra-node RMA
       in shared memory backed windows
     - Multi-threading support
     - Enhanced support for multi-threaded MPI applications

* OpenSHMEM Features:
     - OpenSHMEM v1.0 standard compliance
     - Based on OpenSHMEM reference implementation v1.0c
     - Optimized RDMA-based implementation of OpenSHMEM
       data movement routines
     - Efficient implementation of OpenSHMEM atomics using RDMA atomics
     - High performance intra-node communication using
       shared memory based schemes

* Hybrid Program Features:
     - Supports hybrid programming using MPI and OpenSHMEM
     - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
     - Optimized network resource utilization through the
       unified communication runtime
     - Efficient deadlock-free progress of MPI and OpenSHMEM calls

* Unified Runtime Features:
     - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
       and Hybrid programs benefit from its features listed below:
        - Scalable inter-node communication with highest performance
          and reduced memory usage
        - Integrated RC/XRC design to get best performance on
          large-scale systems with reduced/constant memory footprint
        - RDMA Fast Path connections for efficient small
          message communication
        - Shared Receive Queue (SRQ) with flow control to significantly
          reduce memory footprint of the library
        - AVL tree-based resource-aware registration cache
        - Automatic tuning based on network adapter and host architecture
        - Optimized intra-node communication support by taking
          advantage of shared-memory communication
        - Efficient Buffer Organization for Memory Scalability of
          Intra-node Communication
        - Automatic intra-node communication parameter tuning
          based on platform
        - Flexible CPU binding capabilities
        - Portable Hardware Locality (hwloc v1.5) support for
          defining CPU affinity
        - Efficient CPU binding policies (bunch and scatter patterns,
          socket and numanode granularities) to specify CPU binding
          per job for modern multi-core platforms
        - Allow user-defined flexible processor affinity
        - Two modes of communication progress
           - Polling
           - Blocking (enables running multiple processes/processor)
     - Flexible process manager support
        - Support for mpirun rsh, hydra and oshrun process managers

MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM
Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
platform. More performance numbers can be obtained from the following
URL:

   http://mvapich.cse.ohio-state.edu/performance/mvapich2x/

New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since
OMB 3.6 release) are listed here.

* Features:
     - New OpenSHMEM benchmarks
        - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
          osu_oshm_atomics
* Bug fixes:
     - Fix issue with IN_PLACE in osu_gather, osu_scatter and
       osu_allgather benchmarks
     - Destroy the CUDA context at the end in CUDA supported benchmarks

For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
user guides, quick start guide, and accessing the SVN, please visit
the following URL:

   http://mvapich.cse.ohio-state.edu

All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team