[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7

Sun Sep 9 10:52:20 CDT 2012

>From the download page:

http://mvapich.cse.ohio-state.edu/download/mvapich2/
MVAPICH2 1.9a is available as a single integrated package (with MPICH2 
1.4.1p1) for download.

On Sun, 9 Sep 2012, Jed Brown wrote:

> 
> Which version of MPICH2 is this based on? Does it support the nonblocking collectives in MPICH2-1.5?
> 
> On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <panda at cse.ohio-state.edu> wrote:
>       These releases might be of interest to some of the MPICH users. Thus, I am posting it here.
>
>       Thanks,
>
>       DK
> 
>
>       ---------- Forwarded message ----------
>       Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
>       From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>       To: mvapich-discuss at cse.ohio-state.edu
>       Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>       Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
>           MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
>
>       The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,
>       MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
>       Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.
>
>       Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
>       MVAPICH2 1.8GA release) are listed here.
>
>       * New Features and Enhancements (since 1.8GA):
>           - Support for InfiniBand hardware UD-multicast
>           - Scalable UD-multicast-based designs for collectives
>             (Bcast, Allreduce and Scatter)
>              - Sample Bcast numbers:
>       http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml
>           - Enhanced Bcast and Reduce collectives with pt-to-pt communication
>           - LiMIC-based design for Gather collective
>           - Improved performance for shared-memory-aware collectives
>           - Improved intra-node communication performance with GPU buffers
>             using pipelined design
>           - Improved inter-node communication performance with GPU buffers
>             with non-blocking CUDA copies
>           - Improved small message communication performance with
>             GPU buffers using CUDA IPC design
>           - Improved automatic GPU device selection and CUDA context management
>           - Optimal communication channel selection for different
>             GPU communication modes (DD, DH and HD) in different
>             configurations (intra-IOH and inter-IOH)
>           - Removed libibumad dependency for building the library
>           - Option for selecting non-default gid-index in a loss-less
>             fabric setup in RoCE mode
>           - Option to disable signal handler setup
>           - Tuned thresholds for various architectures
>           - Set DAPL-2.0 as the default version for the uDAPL interface
>           - Updated to hwloc v1.5
>           - Option to use IP address as a fallback if hostname
>             cannot be resolved
>           - Improved error reporting
>
>       * Bug-Fixes (since 1.8GA):
>           - Fix issue in intra-node knomial bcast
>           - Handle gethostbyname return values gracefully
>           - Fix corner case issue in two-level gather code path
>           - Fix bug in CUDA events/streams pool management
>           - Fix ptmalloc initialization issue when MALLOC_CHECK_ is
>             defined in the environment
>               - Thanks to Mehmet Belgin from Georgia Institute of
>                 Technology for the report
>           - Fix memory corruption and handle heterogeneous architectures
>             in gather collective
>           - Fix issue in detecting the correct HCA type
>           - Fix issue in ring start-up to select correct HCA when
>             MV2_IBA_HCA is specified
>           - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
>           - Fix memory corruption on nodes with 64-cores
>               - Thanks to M Xie for the report
>           - Fix hang in MPI_Finalize with Nemesis interface when
>             ptmalloc initialization fails
>               - Thanks to Carson Holt from OICR for the report
>           - Fix memory corruption in shared memory communication
>               - Thanks to Craig Tierney from NOAA for the report
>                 and testing the patch
>           - Fix issue in IB ring start-up selection with mpiexec.hydra
>           - Fix issue in selecting CUDA run-time variables when running
>             on single node in SMP only mode
>           - Fix few memory leaks and warnings
>
>       MVAPICH2-X 1.9a software package (released as a technology preview)
>       provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
>       with unified communication runtime for emerging exascale systems.
>       This software package provides flexibility for users to write
>       applications using the following programming models with a unified
>       communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
>       well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.
>
>       Features for MVAPICH2-X 1.9a are as follows:
>
>       * MPI Features:
>           - MPI-2.2 standard compliance
>           - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can
>             take advantage of all the features enabled by default
>             in OFA-IB-CH3 interface of MVAPICH2 1.9a
>           - High performance two-sided communication scalable to
>             multi-thousand nodes
>           - Optimized collective communication operations
>           - Shared-memory optimized algorithms for barrier, broadcast,
>             reduce and allreduce operations
>           - Optimized two-level designs for scatter and gather operations
>           - Improved implementation of allgather, alltoall operations
>           - High-performance and scalable support for one-sided communication
>           - Direct RDMA based designs for one-sided communication
>           - Shared memory backed Windows for One-Sided communication
>           - Support for truly passive locking for intra-node RMA
>             in shared memory backed windows
>           - Multi-threading support
>           - Enhanced support for multi-threaded MPI applications
>
>       * OpenSHMEM Features:
>           - OpenSHMEM v1.0 standard compliance
>           - Based on OpenSHMEM reference implementation v1.0c
>           - Optimized RDMA-based implementation of OpenSHMEM
>             data movement routines
>           - Efficient implementation of OpenSHMEM atomics using RDMA atomics
>           - High performance intra-node communication using
>             shared memory based schemes
>
>       * Hybrid Program Features:
>           - Supports hybrid programming using MPI and OpenSHMEM
>           - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
>           - Optimized network resource utilization through the
>             unified communication runtime
>           - Efficient deadlock-free progress of MPI and OpenSHMEM calls
>
>       * Unified Runtime Features:
>           - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
>             and Hybrid programs benefit from its features listed below:
>              - Scalable inter-node communication with highest performance
>                and reduced memory usage
>              - Integrated RC/XRC design to get best performance on
>                large-scale systems with reduced/constant memory footprint
>              - RDMA Fast Path connections for efficient small
>                message communication
>              - Shared Receive Queue (SRQ) with flow control to significantly
>                reduce memory footprint of the library
>              - AVL tree-based resource-aware registration cache
>              - Automatic tuning based on network adapter and host architecture
>              - Optimized intra-node communication support by taking
>                advantage of shared-memory communication
>              - Efficient Buffer Organization for Memory Scalability of
>                Intra-node Communication
>              - Automatic intra-node communication parameter tuning
>                based on platform
>              - Flexible CPU binding capabilities
>              - Portable Hardware Locality (hwloc v1.5) support for
>                defining CPU affinity
>              - Efficient CPU binding policies (bunch and scatter patterns,
>                socket and numanode granularities) to specify CPU binding
>                per job for modern multi-core platforms
>              - Allow user-defined flexible processor affinity
>              - Two modes of communication progress
>                 - Polling
>                 - Blocking (enables running multiple processes/processor)
>           - Flexible process manager support
>              - Support for mpirun rsh, hydra and oshrun process managers
>
>       MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM
>       Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
>       intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
>       platform. More performance numbers can be obtained from the following
>       URL:
>
>         http://mvapich.cse.ohio-state.edu/performance/mvapich2x/
>
>       New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since
>       OMB 3.6 release) are listed here.
>
>       * Features:
>           - New OpenSHMEM benchmarks
>              - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
>                osu_oshm_atomics
>       * Bug fixes:
>           - Fix issue with IN_PLACE in osu_gather, osu_scatter and
>             osu_allgather benchmarks
>           - Destroy the CUDA context at the end in CUDA supported benchmarks
>
>       For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
>       user guides, quick start guide, and accessing the SVN, please visit
>       the following URL:
>
>         http://mvapich.cse.ohio-state.edu
>
>       All questions, feedbacks, bug reports, hints for performance tuning,
>       patches and enhancements are welcome. Please post it to the
>       mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
>
>       Thanks,
>
>       The MVAPICH Team
>       _______________________________________________
>       mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>       To manage subscription options or unsubscribe:
>       https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
>