<p>Which version of MPICH2 is this based on? Does it support the nonblocking collectives in MPICH2-1.5?</p>

<div class="gmail_quote">On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <<a href="mailto:panda@cse.ohio-state.edu">panda@cse.ohio-state.edu</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

These releases might be of interest to some of the MPICH users. Thus, I am posting it here.<br>

<br>

Thanks,<br>

<br>

DK<br>

<br>

<br>

---------- Forwarded message ----------<br>

Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)<br>

From: Dhabaleswar Panda <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>><br>

To: <a href="mailto:mvapich-discuss@cse.ohio-state.edu" target="_blank">mvapich-discuss@cse.ohio-<u></u>state.edu</a><br>

Cc: Dhabaleswar Panda <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>><br>

Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,<br>

    MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7<br>

<br>

The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,<br>

MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified<br>

Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.<br>

<br>

Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since<br>

MVAPICH2 1.8GA release) are listed here.<br>

<br>

* New Features and Enhancements (since 1.8GA):<br>

    - Support for InfiniBand hardware UD-multicast<br>

    - Scalable UD-multicast-based designs for collectives<br>

      (Bcast, Allreduce and Scatter)<br>

       - Sample Bcast numbers:<br>

<a href="http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/performance/mvapich2/coll_<u></u>multicast.shtml</a><br>

    - Enhanced Bcast and Reduce collectives with pt-to-pt communication<br>

    - LiMIC-based design for Gather collective<br>

    - Improved performance for shared-memory-aware collectives<br>

    - Improved intra-node communication performance with GPU buffers<br>

      using pipelined design<br>

    - Improved inter-node communication performance with GPU buffers<br>

      with non-blocking CUDA copies<br>

    - Improved small message communication performance with<br>

      GPU buffers using CUDA IPC design<br>

    - Improved automatic GPU device selection and CUDA context management<br>

    - Optimal communication channel selection for different<br>

      GPU communication modes (DD, DH and HD) in different<br>

      configurations (intra-IOH and inter-IOH)<br>

    - Removed libibumad dependency for building the library<br>

    - Option for selecting non-default gid-index in a loss-less<br>

      fabric setup in RoCE mode<br>

    - Option to disable signal handler setup<br>

    - Tuned thresholds for various architectures<br>

    - Set DAPL-2.0 as the default version for the uDAPL interface<br>

    - Updated to hwloc v1.5<br>

    - Option to use IP address as a fallback if hostname<br>

      cannot be resolved<br>

    - Improved error reporting<br>

<br>

* Bug-Fixes (since 1.8GA):<br>

    - Fix issue in intra-node knomial bcast<br>

    - Handle gethostbyname return values gracefully<br>

    - Fix corner case issue in two-level gather code path<br>

    - Fix bug in CUDA events/streams pool management<br>

    - Fix ptmalloc initialization issue when MALLOC_CHECK_ is<br>

      defined in the environment<br>

        - Thanks to Mehmet Belgin from Georgia Institute of<br>

          Technology for the report<br>

    - Fix memory corruption and handle heterogeneous architectures<br>

      in gather collective<br>

    - Fix issue in detecting the correct HCA type<br>

    - Fix issue in ring start-up to select correct HCA when<br>

      MV2_IBA_HCA is specified<br>

    - Fix SEGFAULT in MPI_Finalize when IB loop-back is used<br>

    - Fix memory corruption on nodes with 64-cores<br>

        - Thanks to M Xie for the report<br>

    - Fix hang in MPI_Finalize with Nemesis interface when<br>

      ptmalloc initialization fails<br>

        - Thanks to Carson Holt from OICR for the report<br>

    - Fix memory corruption in shared memory communication<br>

        - Thanks to Craig Tierney from NOAA for the report<br>

          and testing the patch<br>

    - Fix issue in IB ring start-up selection with mpiexec.hydra<br>

    - Fix issue in selecting CUDA run-time variables when running<br>

      on single node in SMP only mode<br>

    - Fix few memory leaks and warnings<br>

<br>

MVAPICH2-X 1.9a software package (released as a technology preview)<br>

provides support for hybrid MPI+PGAS (OpenSHMEM) programming models<br>

with unified communication runtime for emerging exascale systems.<br>

This software package provides flexibility for users to write<br>

applications using the following programming models with a unified<br>

communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as<br>

well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.<br>

<br>

Features for MVAPICH2-X 1.9a are as follows:<br>

<br>

* MPI Features:<br>

    - MPI-2.2 standard compliance<br>

    - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can<br>

      take advantage of all the features enabled by default<br>

      in OFA-IB-CH3 interface of MVAPICH2 1.9a<br>

    - High performance two-sided communication scalable to<br>

      multi-thousand nodes<br>

    - Optimized collective communication operations<br>

    - Shared-memory optimized algorithms for barrier, broadcast,<br>

      reduce and allreduce operations<br>

    - Optimized two-level designs for scatter and gather operations<br>

    - Improved implementation of allgather, alltoall operations<br>

    - High-performance and scalable support for one-sided communication<br>

    - Direct RDMA based designs for one-sided communication<br>

    - Shared memory backed Windows for One-Sided communication<br>

    - Support for truly passive locking for intra-node RMA<br>

      in shared memory backed windows<br>

    - Multi-threading support<br>

    - Enhanced support for multi-threaded MPI applications<br>

<br>

* OpenSHMEM Features:<br>

    - OpenSHMEM v1.0 standard compliance<br>

    - Based on OpenSHMEM reference implementation v1.0c<br>

    - Optimized RDMA-based implementation of OpenSHMEM<br>

      data movement routines<br>

    - Efficient implementation of OpenSHMEM atomics using RDMA atomics<br>

    - High performance intra-node communication using<br>

      shared memory based schemes<br>

<br>

* Hybrid Program Features:<br>

    - Supports hybrid programming using MPI and OpenSHMEM<br>

    - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards<br>

    - Optimized network resource utilization through the<br>

      unified communication runtime<br>

    - Efficient deadlock-free progress of MPI and OpenSHMEM calls<br>

<br>

* Unified Runtime Features:<br>

    - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM<br>

      and Hybrid programs benefit from its features listed below:<br>

       - Scalable inter-node communication with highest performance<br>

         and reduced memory usage<br>

       - Integrated RC/XRC design to get best performance on<br>

         large-scale systems with reduced/constant memory footprint<br>

       - RDMA Fast Path connections for efficient small<br>

         message communication<br>

       - Shared Receive Queue (SRQ) with flow control to significantly<br>

         reduce memory footprint of the library<br>

       - AVL tree-based resource-aware registration cache<br>

       - Automatic tuning based on network adapter and host architecture<br>

       - Optimized intra-node communication support by taking<br>

         advantage of shared-memory communication<br>

       - Efficient Buffer Organization for Memory Scalability of<br>

         Intra-node Communication<br>

       - Automatic intra-node communication parameter tuning<br>

         based on platform<br>

       - Flexible CPU binding capabilities<br>

       - Portable Hardware Locality (hwloc v1.5) support for<br>

         defining CPU affinity<br>

       - Efficient CPU binding policies (bunch and scatter patterns,<br>

         socket and numanode granularities) to specify CPU binding<br>

         per job for modern multi-core platforms<br>

       - Allow user-defined flexible processor affinity<br>

       - Two modes of communication progress<br>

          - Polling<br>

          - Blocking (enables running multiple processes/processor)<br>

    - Flexible process manager support<br>

       - Support for mpirun rsh, hydra and oshrun process managers<br>

<br>

MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM<br>

Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put<br>

intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge<br>

platform. More performance numbers can be obtained from the following<br>

URL:<br>

<br>

  <a href="http://mvapich.cse.ohio-state.edu/performance/mvapich2x/" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/performance/mvapich2x/</a><br>

<br>

New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since<br>

OMB 3.6 release) are listed here.<br>

<br>

* Features:<br>

    - New OpenSHMEM benchmarks<br>

       - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and<br>

         osu_oshm_atomics<br>

* Bug fixes:<br>

    - Fix issue with IN_PLACE in osu_gather, osu_scatter and<br>

      osu_allgather benchmarks<br>

    - Destroy the CUDA context at the end in CUDA supported benchmarks<br>

<br>

For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated<br>

user guides, quick start guide, and accessing the SVN, please visit<br>

the following URL:<br>

<br>

  <a href="http://mvapich.cse.ohio-state.edu" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu</a><br>

<br>

All questions, feedbacks, bug reports, hints for performance tuning,<br>

patches and enhancements are welcome. Please post it to the<br>

mvapich-discuss mailing list (<a href="mailto:mvapich-discuss@cse.ohio-state.edu" target="_blank">mvapich-discuss@cse.ohio-<u></u>state.edu</a>).<br>

<br>

Thanks,<br>

<br>

The MVAPICH Team<br>

______________________________<u></u>_________________<br>

mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>

To manage subscription options or unsubscribe:<br>

<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>

</blockquote></div>