<p>Thanks, missed that on my phone.</p>

<p>I'm looking forward to a performant NBC implementation on IB. Open MPI trunk has an implementation based on libNBC, but my testing shows it slowing down point-to-point far more than MPICH2's NBC implementation.</p>


<div class="gmail_quote">On Sep 9, 2012 11:52 AM, "Evren Yurtesen IB" <<a href="mailto:eyurtese@abo.fi">eyurtese@abo.fi</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

>From the download page:<br>

<br>

<a href="http://mvapich.cse.ohio-state.edu/download/mvapich2/" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/download/mvapich2/</a><br>

MVAPICH2 1.9a is available as a single integrated package (with MPICH2 1.4.1p1) for download.<br>

<br>

On Sun, 9 Sep 2012, Jed Brown wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Which version of MPICH2 is this based on? Does it support the nonblocking collectives in MPICH2-1.5?<br>

<br>

On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>> wrote:<br>

      These releases might be of interest to some of the MPICH users. Thus, I am posting it here.<br>

<br>

      Thanks,<br>

<br>

      DK<br>

<br>

<br>

      ---------- Forwarded message ----------<br>

      Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)<br>

      From: Dhabaleswar Panda <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>><br>

      To: <a href="mailto:mvapich-discuss@cse.ohio-state.edu" target="_blank">mvapich-discuss@cse.ohio-<u></u>state.edu</a><br>

      Cc: Dhabaleswar Panda <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>><br>

      Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,<br>

          MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7<br>

<br>

      The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,<br>

      MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified<br>

      Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.<br>

<br>

      Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since<br>

      MVAPICH2 1.8GA release) are listed here.<br>

<br>

      * New Features and Enhancements (since 1.8GA):<br>

          - Support for InfiniBand hardware UD-multicast<br>

          - Scalable UD-multicast-based designs for collectives<br>

            (Bcast, Allreduce and Scatter)<br>

             - Sample Bcast numbers:<br>

      <a href="http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/performance/mvapich2/coll_<u></u>multicast.shtml</a><br>

          - Enhanced Bcast and Reduce collectives with pt-to-pt communication<br>

          - LiMIC-based design for Gather collective<br>

          - Improved performance for shared-memory-aware collectives<br>

          - Improved intra-node communication performance with GPU buffers<br>

            using pipelined design<br>

          - Improved inter-node communication performance with GPU buffers<br>

            with non-blocking CUDA copies<br>

          - Improved small message communication performance with<br>

            GPU buffers using CUDA IPC design<br>

          - Improved automatic GPU device selection and CUDA context management<br>

          - Optimal communication channel selection for different<br>

            GPU communication modes (DD, DH and HD) in different<br>

            configurations (intra-IOH and inter-IOH)<br>

          - Removed libibumad dependency for building the library<br>

          - Option for selecting non-default gid-index in a loss-less<br>

            fabric setup in RoCE mode<br>

          - Option to disable signal handler setup<br>

          - Tuned thresholds for various architectures<br>

          - Set DAPL-2.0 as the default version for the uDAPL interface<br>

          - Updated to hwloc v1.5<br>

          - Option to use IP address as a fallback if hostname<br>

            cannot be resolved<br>

          - Improved error reporting<br>

<br>

      * Bug-Fixes (since 1.8GA):<br>

          - Fix issue in intra-node knomial bcast<br>

          - Handle gethostbyname return values gracefully<br>

          - Fix corner case issue in two-level gather code path<br>

          - Fix bug in CUDA events/streams pool management<br>

          - Fix ptmalloc initialization issue when MALLOC_CHECK_ is<br>

            defined in the environment<br>

              - Thanks to Mehmet Belgin from Georgia Institute of<br>

                Technology for the report<br>

          - Fix memory corruption and handle heterogeneous architectures<br>

            in gather collective<br>

          - Fix issue in detecting the correct HCA type<br>

          - Fix issue in ring start-up to select correct HCA when<br>

            MV2_IBA_HCA is specified<br>

          - Fix SEGFAULT in MPI_Finalize when IB loop-back is used<br>

          - Fix memory corruption on nodes with 64-cores<br>

              - Thanks to M Xie for the report<br>

          - Fix hang in MPI_Finalize with Nemesis interface when<br>

            ptmalloc initialization fails<br>

              - Thanks to Carson Holt from OICR for the report<br>

          - Fix memory corruption in shared memory communication<br>

              - Thanks to Craig Tierney from NOAA for the report<br>

                and testing the patch<br>

          - Fix issue in IB ring start-up selection with mpiexec.hydra<br>

          - Fix issue in selecting CUDA run-time variables when running<br>

            on single node in SMP only mode<br>

          - Fix few memory leaks and warnings<br>

<br>

      MVAPICH2-X 1.9a software package (released as a technology preview)<br>

      provides support for hybrid MPI+PGAS (OpenSHMEM) programming models<br>

      with unified communication runtime for emerging exascale systems.<br>

      This software package provides flexibility for users to write<br>

      applications using the following programming models with a unified<br>

      communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as<br>

      well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.<br>

<br>

      Features for MVAPICH2-X 1.9a are as follows:<br>

<br>

      * MPI Features:<br>

          - MPI-2.2 standard compliance<br>

          - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can<br>

            take advantage of all the features enabled by default<br>

            in OFA-IB-CH3 interface of MVAPICH2 1.9a<br>

          - High performance two-sided communication scalable to<br>

            multi-thousand nodes<br>

          - Optimized collective communication operations<br>

          - Shared-memory optimized algorithms for barrier, broadcast,<br>

            reduce and allreduce operations<br>

          - Optimized two-level designs for scatter and gather operations<br>

          - Improved implementation of allgather, alltoall operations<br>

          - High-performance and scalable support for one-sided communication<br>

          - Direct RDMA based designs for one-sided communication<br>

          - Shared memory backed Windows for One-Sided communication<br>

          - Support for truly passive locking for intra-node RMA<br>

            in shared memory backed windows<br>

          - Multi-threading support<br>

          - Enhanced support for multi-threaded MPI applications<br>

<br>

      * OpenSHMEM Features:<br>

          - OpenSHMEM v1.0 standard compliance<br>

          - Based on OpenSHMEM reference implementation v1.0c<br>

          - Optimized RDMA-based implementation of OpenSHMEM<br>

            data movement routines<br>

          - Efficient implementation of OpenSHMEM atomics using RDMA atomics<br>

          - High performance intra-node communication using<br>

            shared memory based schemes<br>

<br>

      * Hybrid Program Features:<br>

          - Supports hybrid programming using MPI and OpenSHMEM<br>

          - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards<br>

          - Optimized network resource utilization through the<br>

            unified communication runtime<br>

          - Efficient deadlock-free progress of MPI and OpenSHMEM calls<br>

<br>

      * Unified Runtime Features:<br>

          - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM<br>

            and Hybrid programs benefit from its features listed below:<br>

             - Scalable inter-node communication with highest performance<br>

               and reduced memory usage<br>

             - Integrated RC/XRC design to get best performance on<br>

               large-scale systems with reduced/constant memory footprint<br>

             - RDMA Fast Path connections for efficient small<br>

               message communication<br>

             - Shared Receive Queue (SRQ) with flow control to significantly<br>

               reduce memory footprint of the library<br>

             - AVL tree-based resource-aware registration cache<br>

             - Automatic tuning based on network adapter and host architecture<br>

             - Optimized intra-node communication support by taking<br>

               advantage of shared-memory communication<br>

             - Efficient Buffer Organization for Memory Scalability of<br>

               Intra-node Communication<br>

             - Automatic intra-node communication parameter tuning<br>

               based on platform<br>

             - Flexible CPU binding capabilities<br>

             - Portable Hardware Locality (hwloc v1.5) support for<br>

               defining CPU affinity<br>

             - Efficient CPU binding policies (bunch and scatter patterns,<br>

               socket and numanode granularities) to specify CPU binding<br>

               per job for modern multi-core platforms<br>

             - Allow user-defined flexible processor affinity<br>

             - Two modes of communication progress<br>

                - Polling<br>

                - Blocking (enables running multiple processes/processor)<br>

          - Flexible process manager support<br>

             - Support for mpirun rsh, hydra and oshrun process managers<br>

<br>

      MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM<br>

      Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put<br>

      intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge<br>

      platform. More performance numbers can be obtained from the following<br>

      URL:<br>

<br>

        <a href="http://mvapich.cse.ohio-state.edu/performance/mvapich2x/" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/performance/mvapich2x/</a><br>

<br>

      New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since<br>

      OMB 3.6 release) are listed here.<br>

<br>

      * Features:<br>

          - New OpenSHMEM benchmarks<br>

             - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and<br>

               osu_oshm_atomics<br>

      * Bug fixes:<br>

          - Fix issue with IN_PLACE in osu_gather, osu_scatter and<br>

            osu_allgather benchmarks<br>

          - Destroy the CUDA context at the end in CUDA supported benchmarks<br>

<br>

      For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated<br>

      user guides, quick start guide, and accessing the SVN, please visit<br>

      the following URL:<br>

<br>

        <a href="http://mvapich.cse.ohio-state.edu" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu</a><br>

<br>

      All questions, feedbacks, bug reports, hints for performance tuning,<br>

      patches and enhancements are welcome. Please post it to the<br>

      mvapich-discuss mailing list (<a href="mailto:mvapich-discuss@cse.ohio-state.edu" target="_blank">mvapich-discuss@cse.ohio-<u></u>state.edu</a>).<br>

<br>

      Thanks,<br>

<br>

      The MVAPICH Team<br>

      ______________________________<u></u>_________________<br>

      mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>

      To manage subscription options or unsubscribe:<br>

      <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>

<br>

<br>

</blockquote>

<br>_______________________________________________<br>

mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

To manage subscription options or unsubscribe:<br>

<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>

<br></blockquote></div>