[mpich-discuss] [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a2, MVAPICH2-X 1.9a2 and OSU Micro-Benchmarks (OMB) 3.8 (fwd)

Thu Nov 8 23:32:04 CST 2012

These releases might be of interest to some of the MPICH users. Thus, I am 
posting it here.

Thanks,

DK

---------- Forwarded message ----------
Date: Thu, 8 Nov 2012 22:20:50 -0500 (EST)
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
To: mvapich-discuss at cse.ohio-state.edu
Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a2,
     MVAPICH2-X 1.9a2 and OSU Micro-Benchmarks (OMB) 3.8

The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a2, 
MVAPICH2-X 1.9a2 (Hybrid MPI+PGAS (UPC and OpenSHMEM) with Unified 
Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.8.

====================================================================

Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a2 (since
MVAPICH2 1.8.1 release) are listed here.

* New Features and Enhancements (since 1.8.1). (**) indicates enhancement
   since 1.9a:
     - (**) Based on MPICH2-1.5
     - (**) Initial support for MPI-3:
         (Available for all interfaces: OFA-IB-CH3, OFA-iWARP-CH3,
         OFA-RoCE-CH3, uDAPL-CH3, OFA-IB-Nemesis and PSM-CH3)
         - Non-blocking collective functions available as "MPIX_" functions
           (e.g., "MPIX_Ibcast")
         - Neighborhood collective routines available as "MPIX_" functions
           (e.g., "MPIX_Neighbor_allgather")
         - MPI_Comm_split_type function available as an "MPIX_" function
         - Support for MPIX_Type_create_hindexed_block
         - Non-blocking communicator duplication routine MPIX_Comm_idup
           (will only work for single-threaded programs)
         - MPIX_Comm_create_group support
         - Support for matched probe functionality (e.g., MPIX_Mprobe,
           MPIX_Improbe, MPIX_Mrecv, and MPIX_Imrecv),
           (Not Available for PSM)
         - Support for "Const" (disabled by default)
     - (**) Efficient vector, hindexed datatype processing on GPU buffers
     - (**) Tuned alltoall, Scatter and Allreduce collectives
     - (**) Support for Mellanox Connect-IB HCA
     - (**) Adaptive number of registration cache entries based on job size
     - (**) Revamped Build system:
         - Uses automake instead of simplemake
         - Renamed "maint/updatefiles" to "autogen.sh"
         - Allows for parallel builds ("make -j8" and similar)
     - Support for InfiniBand hardware UD-multicast
     - Scalable UD-multicast-based designs for collectives
       (Bcast, Allreduce and Scatter)
         - Sample Bcast numbers:
       http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml
     - Enhanced Bcast and Reduce collectives with pt-to-pt communication
     - LiMIC-based design for Gather collective
     - Improved performance for shared-memory-aware collectives
     - Improved intra-node communication performance with GPU buffers
       using pipelined design
     - Improved inter-node communication performance with GPU buffers
       with non-blocking CUDA copies
     - Improved small message communication performance with
       GPU buffers using CUDA IPC design
     - Improved automatic GPU device selection and CUDA context management
     - Optimal communication channel selection for different
       GPU communication modes (DD, DH and HD) in different
       configurations (intra-IOH and inter-IOH)
     - Removed libibumad dependency for building the library
     - Tuned thresholds for various architectures
     - Set DAPL-2.0 as the default version for the uDAPL interface
     - Updated to hwloc v1.5
     - Option to use IP address as a fallback if hostname
       cannot be resolved

* Bug-Fixes (since 1.8.1). (**) indicates fix since 1.9a:
     - (**) CPU frequency mismatch warning shown under debug
     - (**) Fix issue with MPI_IN_PLACE buffers with CUDA
     - (**) Fix ptmalloc initialization issue due to compiler optimization
            - Thanks to Kyle Sheumaker from ACT for the report
     - (**) Adjustable MAX_NUM_PORTS at build time to support
            more than two ports
     - (**) Fix issue with MPI_Allreduce with MPI_IN_PLACE send buffer
     - (**) Fix memleak in MPI_Cancel with PSM interface
            - Thanks to Andrew Friedley from LLNL for the report

====================================================================

MVAPICH2-X 1.9a2 software package (released as a technology preview)
provides support for hybrid MPI+PGAS (UPC and OpenSHMEM) programming
models with unified communication runtime for emerging exascale
systems.  This software package provides flexibility for users to
write applications using the following programming models with a
unified communication runtime: MPI, MPI+OpenMP, pure UPC, and pure
OpenSHMEM programs as well as hybrid MPI(+OpenMP) + PGAS (UPC and
OpenSHMEM) programs.

Features for MVAPICH2-X 1.9a2 are as follows:

* MPI Features. (**) indicates feature since 1.9a:
     - (**) MPI-2.2 standard compliance and initial support for MPI-3
     - (**) Based on MVAPICH2 1.9a2 (OFA-IB-CH3 interface). MPI programs
            can take advantage of all the features enabled by default
            in OFA-IB-CH3 interface of MVAPICH2 1.9a2
     - High performance two-sided communication scalable to
       multi-thousand nodes
     - Optimized collective communication operations
     - Shared-memory optimized algorithms for barrier, broadcast,
       reduce and allreduce operations
     - Optimized two-level designs for scatter and gather operations
     - Improved implementation of allgather, alltoall operations
     - High-performance and scalable support for one-sided communication
     - Direct RDMA based designs for one-sided communication
     - Shared memory backed Windows for One-Sided communication
     - Support for truly passive locking for intra-node RMA
       in shared memory backed windows
     - Multi-threading support
     - Enhanced support for multi-threaded MPI applications

* (**) Unified Parallel C (UPC) Features
     - UPC Language Specification v1.2 standard compliance
     - Based on Berkeley UPC v2.14.2
     - Optimized RDMA-based implementation of UPC data movement routines
     - Improved UPC memput design for small/medium size messages

* OpenSHMEM Features:
     - OpenSHMEM v1.0 standard compliance
     - Based on OpenSHMEM reference implementation v1.0c
     - Optimized RDMA-based implementation of OpenSHMEM
       data movement routines
     - Efficient implementation of OpenSHMEM atomics using RDMA atomics
     - High performance intra-node communication using
       shared memory based schemes
     - (**) Optimized OpenSHMEM put routines for small/medium message sizes

* Hybrid Program Features:
     - (**) Supports hybrid programming using MPI(+OpenMP),
            MPI(+OpenMP)+UPC and MPI(+OpenMP)+OpenSHMEM
     - (**) Compliance to MPI-2.2 and initial support for MPI-3 features,
            UPC v1.2 and OpenSHMEM v1.0 standards
     - Optimized network resource utilization through the
       unified communication runtime
     - (**) Efficient deadlock-free progress of MPI and UPC/OpenSHMEM calls

* Unified Runtime Features:
     - (**) Based on MVAPICH2 1.9a2 (OFA-IB-CH3 interface). All the
            runtime features enabled by default in OFA-IB-CH3 interface of
            MVAPICH2 1.9a2 are available in MVAPICH2-X 1.9a2. MPI, UPC,
            OpenSHMEM and Hybrid programs benefit from its features
            listed below:
     - Scalable inter-node communication with highest performance
       and reduced memory usage
     - Integrated RC/XRC design to get best performance on
       large-scale systems with reduced/constant memory footprint
     - RDMA Fast Path connections for efficient small
       message communication
     - Shared Receive Queue (SRQ) with flow control to significantly
       reduce memory footprint of the library
     - AVL tree-based resource-aware registration cache
     - Automatic tuning based on network adapter and host architecture
     - Optimized intra-node communication support by taking
       advantage of shared-memory communication
     - Efficient Buffer Organization for Memory Scalability of
       Intra-node Communication
     - Automatic intra-node communication parameter tuning
       based on platform
     - Flexible CPU binding capabilities
     - Portable Hardware Locality (hwloc v1.5) support for
       defining CPU affinity
     - Efficient CPU binding policies (bunch and scatter patterns,
       socket and numanode granularities) to specify CPU binding
       per job for modern multi-core platforms
     - Allow user-defined flexible processor affinity
     - Two modes of communication progress
        - Polling
        - Blocking (enables running multiple processes/processor)
     - Flexible process manager support
     - Support for mpirun rsh, hydra and oshrun process managers
     - (**) Support for upcrun process manager

* Bug Fixes (since 1.9a):
     - Fixed incorrect compiler selection in oshfort
     - Fixed linker errors with Intel oshfort compiler

MVAPICH2-X delivers excellent performance. Sample UPC and OpenSHMEM
performance numbers can be obtained from the following URL:

http://mvapich.cse.ohio-state.edu/performance/mvapich2x/

====================================================================

New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.8 (since
OMB 3.7 release) are listed here.

* Features:
     - New UPC benchmarks
        - osu_upc_memput
        - osu_upc_memget

====================================================================

For downloading MVAPICH2 1.9a2, MVAPICH2-X 1.9a2, OMB 3.8, associated
user guides, quick start guide, and accessing the SVN, please visit
the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).

We are also happy to inform that the number of organizations using
MVAPICH2 and MVAPICH2-X (and registered at the MVAPICH site) has
crossed 2,000 world-wide (in 70 countries). The MVAPICH team extends
thanks to all these organizations.

Thanks,

The MVAPICH Team
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss