[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7

Sun Sep 9 10:48:07 CDT 2012

Which version of MPICH2 is this based on? Does it support the nonblocking
collectives in MPICH2-1.5?
On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <panda at cse.ohio-state.edu>
wrote:

> These releases might be of interest to some of the MPICH users. Thus, I am
> posting it here.
>
> Thanks,
>
> DK
>
>
> ---------- Forwarded message ----------
> Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
> From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
> To: mvapich-discuss at cse.ohio-**state.edu<mvapich-discuss at cse.ohio-state.edu>
> Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
> Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
>     MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
>
> The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,
> MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
> Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.
>
> Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
> MVAPICH2 1.8GA release) are listed here.
>
> * New Features and Enhancements (since 1.8GA):
>     - Support for InfiniBand hardware UD-multicast
>     - Scalable UD-multicast-based designs for collectives
>       (Bcast, Allreduce and Scatter)
>        - Sample Bcast numbers:
> http://mvapich.cse.ohio-state.**edu/performance/mvapich2/coll_**
> multicast.shtml<http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml>
>     - Enhanced Bcast and Reduce collectives with pt-to-pt communication
>     - LiMIC-based design for Gather collective
>     - Improved performance for shared-memory-aware collectives
>     - Improved intra-node communication performance with GPU buffers
>       using pipelined design
>     - Improved inter-node communication performance with GPU buffers
>       with non-blocking CUDA copies
>     - Improved small message communication performance with
>       GPU buffers using CUDA IPC design
>     - Improved automatic GPU device selection and CUDA context management
>     - Optimal communication channel selection for different
>       GPU communication modes (DD, DH and HD) in different
>       configurations (intra-IOH and inter-IOH)
>     - Removed libibumad dependency for building the library
>     - Option for selecting non-default gid-index in a loss-less
>       fabric setup in RoCE mode
>     - Option to disable signal handler setup
>     - Tuned thresholds for various architectures
>     - Set DAPL-2.0 as the default version for the uDAPL interface
>     - Updated to hwloc v1.5
>     - Option to use IP address as a fallback if hostname
>       cannot be resolved
>     - Improved error reporting
>
> * Bug-Fixes (since 1.8GA):
>     - Fix issue in intra-node knomial bcast
>     - Handle gethostbyname return values gracefully
>     - Fix corner case issue in two-level gather code path
>     - Fix bug in CUDA events/streams pool management
>     - Fix ptmalloc initialization issue when MALLOC_CHECK_ is
>       defined in the environment
>         - Thanks to Mehmet Belgin from Georgia Institute of
>           Technology for the report
>     - Fix memory corruption and handle heterogeneous architectures
>       in gather collective
>     - Fix issue in detecting the correct HCA type
>     - Fix issue in ring start-up to select correct HCA when
>       MV2_IBA_HCA is specified
>     - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
>     - Fix memory corruption on nodes with 64-cores
>         - Thanks to M Xie for the report
>     - Fix hang in MPI_Finalize with Nemesis interface when
>       ptmalloc initialization fails
>         - Thanks to Carson Holt from OICR for the report
>     - Fix memory corruption in shared memory communication
>         - Thanks to Craig Tierney from NOAA for the report
>           and testing the patch
>     - Fix issue in IB ring start-up selection with mpiexec.hydra
>     - Fix issue in selecting CUDA run-time variables when running
>       on single node in SMP only mode
>     - Fix few memory leaks and warnings
>
> MVAPICH2-X 1.9a software package (released as a technology preview)
> provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
> with unified communication runtime for emerging exascale systems.
> This software package provides flexibility for users to write
> applications using the following programming models with a unified
> communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
> well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.
>
> Features for MVAPICH2-X 1.9a are as follows:
>
> * MPI Features:
>     - MPI-2.2 standard compliance
>     - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can
>       take advantage of all the features enabled by default
>       in OFA-IB-CH3 interface of MVAPICH2 1.9a
>     - High performance two-sided communication scalable to
>       multi-thousand nodes
>     - Optimized collective communication operations
>     - Shared-memory optimized algorithms for barrier, broadcast,
>       reduce and allreduce operations
>     - Optimized two-level designs for scatter and gather operations
>     - Improved implementation of allgather, alltoall operations
>     - High-performance and scalable support for one-sided communication
>     - Direct RDMA based designs for one-sided communication
>     - Shared memory backed Windows for One-Sided communication
>     - Support for truly passive locking for intra-node RMA
>       in shared memory backed windows
>     - Multi-threading support
>     - Enhanced support for multi-threaded MPI applications
>
> * OpenSHMEM Features:
>     - OpenSHMEM v1.0 standard compliance
>     - Based on OpenSHMEM reference implementation v1.0c
>     - Optimized RDMA-based implementation of OpenSHMEM
>       data movement routines
>     - Efficient implementation of OpenSHMEM atomics using RDMA atomics
>     - High performance intra-node communication using
>       shared memory based schemes
>
> * Hybrid Program Features:
>     - Supports hybrid programming using MPI and OpenSHMEM
>     - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
>     - Optimized network resource utilization through the
>       unified communication runtime
>     - Efficient deadlock-free progress of MPI and OpenSHMEM calls
>
> * Unified Runtime Features:
>     - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
>       and Hybrid programs benefit from its features listed below:
>        - Scalable inter-node communication with highest performance
>          and reduced memory usage
>        - Integrated RC/XRC design to get best performance on
>          large-scale systems with reduced/constant memory footprint
>        - RDMA Fast Path connections for efficient small
>          message communication
>        - Shared Receive Queue (SRQ) with flow control to significantly
>          reduce memory footprint of the library
>        - AVL tree-based resource-aware registration cache
>        - Automatic tuning based on network adapter and host architecture
>        - Optimized intra-node communication support by taking
>          advantage of shared-memory communication
>        - Efficient Buffer Organization for Memory Scalability of
>          Intra-node Communication
>        - Automatic intra-node communication parameter tuning
>          based on platform
>        - Flexible CPU binding capabilities
>        - Portable Hardware Locality (hwloc v1.5) support for
>          defining CPU affinity
>        - Efficient CPU binding policies (bunch and scatter patterns,
>          socket and numanode granularities) to specify CPU binding
>          per job for modern multi-core platforms
>        - Allow user-defined flexible processor affinity
>        - Two modes of communication progress
>           - Polling
>           - Blocking (enables running multiple processes/processor)
>     - Flexible process manager support
>        - Support for mpirun rsh, hydra and oshrun process managers
>
> MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM
> Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
> intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
> platform. More performance numbers can be obtained from the following
> URL:
>
>   http://mvapich.cse.ohio-state.**edu/performance/mvapich2x/<http://mvapich.cse.ohio-state.edu/performance/mvapich2x/>
>
> New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since
> OMB 3.6 release) are listed here.
>
> * Features:
>     - New OpenSHMEM benchmarks
>        - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
>          osu_oshm_atomics
> * Bug fixes:
>     - Fix issue with IN_PLACE in osu_gather, osu_scatter and
>       osu_allgather benchmarks
>     - Destroy the CUDA context at the end in CUDA supported benchmarks
>
> For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
> user guides, quick start guide, and accessing the SVN, please visit
> the following URL:
>
>   http://mvapich.cse.ohio-state.**edu <http://mvapich.cse.ohio-state.edu>
>
> All questions, feedbacks, bug reports, hints for performance tuning,
> patches and enhancements are welcome. Please post it to the
> mvapich-discuss mailing list (mvapich-discuss at cse.ohio-**state.edu<mvapich-discuss at cse.ohio-state.edu>
> ).
>
> Thanks,
>
> The MVAPICH Team
> ______________________________**_________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120909/d6607be7/attachment.html>