[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7

Sun Sep 9 11:07:20 CDT 2012

Thanks, missed that on my phone.

I'm looking forward to a performant NBC implementation on IB. Open MPI
trunk has an implementation based on libNBC, but my testing shows it
slowing down point-to-point far more than MPICH2's NBC implementation.
On Sep 9, 2012 11:52 AM, "Evren Yurtesen IB" <eyurtese at abo.fi> wrote:

> From the download page:
>
> http://mvapich.cse.ohio-state.**edu/download/mvapich2/<http://mvapich.cse.ohio-state.edu/download/mvapich2/>
> MVAPICH2 1.9a is available as a single integrated package (with MPICH2
> 1.4.1p1) for download.
>
> On Sun, 9 Sep 2012, Jed Brown wrote:
>
>
>> Which version of MPICH2 is this based on? Does it support the nonblocking
>> collectives in MPICH2-1.5?
>>
>> On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <panda at cse.ohio-state.edu>
>> wrote:
>>       These releases might be of interest to some of the MPICH users.
>> Thus, I am posting it here.
>>
>>       Thanks,
>>
>>       DK
>>
>>
>>       ---------- Forwarded message ----------
>>       Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
>>       From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>>       To: mvapich-discuss at cse.ohio-**state.edu<mvapich-discuss at cse.ohio-state.edu>
>>       Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>>       Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
>>           MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
>>
>>       The MVAPICH team is pleased to announce the release of MVAPICH2
>> 1.9a,
>>       MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
>>       Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.
>>
>>       Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
>>       MVAPICH2 1.8GA release) are listed here.
>>
>>       * New Features and Enhancements (since 1.8GA):
>>           - Support for InfiniBand hardware UD-multicast
>>           - Scalable UD-multicast-based designs for collectives
>>             (Bcast, Allreduce and Scatter)
>>              - Sample Bcast numbers:
>>       http://mvapich.cse.ohio-state.**edu/performance/mvapich2/coll_**
>> multicast.shtml<http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml>
>>           - Enhanced Bcast and Reduce collectives with pt-to-pt
>> communication
>>           - LiMIC-based design for Gather collective
>>           - Improved performance for shared-memory-aware collectives
>>           - Improved intra-node communication performance with GPU buffers
>>             using pipelined design
>>           - Improved inter-node communication performance with GPU buffers
>>             with non-blocking CUDA copies
>>           - Improved small message communication performance with
>>             GPU buffers using CUDA IPC design
>>           - Improved automatic GPU device selection and CUDA context
>> management
>>           - Optimal communication channel selection for different
>>             GPU communication modes (DD, DH and HD) in different
>>             configurations (intra-IOH and inter-IOH)
>>           - Removed libibumad dependency for building the library
>>           - Option for selecting non-default gid-index in a loss-less
>>             fabric setup in RoCE mode
>>           - Option to disable signal handler setup
>>           - Tuned thresholds for various architectures
>>           - Set DAPL-2.0 as the default version for the uDAPL interface
>>           - Updated to hwloc v1.5
>>           - Option to use IP address as a fallback if hostname
>>             cannot be resolved
>>           - Improved error reporting
>>
>>       * Bug-Fixes (since 1.8GA):
>>           - Fix issue in intra-node knomial bcast
>>           - Handle gethostbyname return values gracefully
>>           - Fix corner case issue in two-level gather code path
>>           - Fix bug in CUDA events/streams pool management
>>           - Fix ptmalloc initialization issue when MALLOC_CHECK_ is
>>             defined in the environment
>>               - Thanks to Mehmet Belgin from Georgia Institute of
>>                 Technology for the report
>>           - Fix memory corruption and handle heterogeneous architectures
>>             in gather collective
>>           - Fix issue in detecting the correct HCA type
>>           - Fix issue in ring start-up to select correct HCA when
>>             MV2_IBA_HCA is specified
>>           - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
>>           - Fix memory corruption on nodes with 64-cores
>>               - Thanks to M Xie for the report
>>           - Fix hang in MPI_Finalize with Nemesis interface when
>>             ptmalloc initialization fails
>>               - Thanks to Carson Holt from OICR for the report
>>           - Fix memory corruption in shared memory communication
>>               - Thanks to Craig Tierney from NOAA for the report
>>                 and testing the patch
>>           - Fix issue in IB ring start-up selection with mpiexec.hydra
>>           - Fix issue in selecting CUDA run-time variables when running
>>             on single node in SMP only mode
>>           - Fix few memory leaks and warnings
>>
>>       MVAPICH2-X 1.9a software package (released as a technology preview)
>>       provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
>>       with unified communication runtime for emerging exascale systems.
>>       This software package provides flexibility for users to write
>>       applications using the following programming models with a unified
>>       communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
>>       well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.
>>
>>       Features for MVAPICH2-X 1.9a are as follows:
>>
>>       * MPI Features:
>>           - MPI-2.2 standard compliance
>>           - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs
>> can
>>             take advantage of all the features enabled by default
>>             in OFA-IB-CH3 interface of MVAPICH2 1.9a
>>           - High performance two-sided communication scalable to
>>             multi-thousand nodes
>>           - Optimized collective communication operations
>>           - Shared-memory optimized algorithms for barrier, broadcast,
>>             reduce and allreduce operations
>>           - Optimized two-level designs for scatter and gather operations
>>           - Improved implementation of allgather, alltoall operations
>>           - High-performance and scalable support for one-sided
>> communication
>>           - Direct RDMA based designs for one-sided communication
>>           - Shared memory backed Windows for One-Sided communication
>>           - Support for truly passive locking for intra-node RMA
>>             in shared memory backed windows
>>           - Multi-threading support
>>           - Enhanced support for multi-threaded MPI applications
>>
>>       * OpenSHMEM Features:
>>           - OpenSHMEM v1.0 standard compliance
>>           - Based on OpenSHMEM reference implementation v1.0c
>>           - Optimized RDMA-based implementation of OpenSHMEM
>>             data movement routines
>>           - Efficient implementation of OpenSHMEM atomics using RDMA
>> atomics
>>           - High performance intra-node communication using
>>             shared memory based schemes
>>
>>       * Hybrid Program Features:
>>           - Supports hybrid programming using MPI and OpenSHMEM
>>           - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
>>           - Optimized network resource utilization through the
>>             unified communication runtime
>>           - Efficient deadlock-free progress of MPI and OpenSHMEM calls
>>
>>       * Unified Runtime Features:
>>           - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
>>             and Hybrid programs benefit from its features listed below:
>>              - Scalable inter-node communication with highest performance
>>                and reduced memory usage
>>              - Integrated RC/XRC design to get best performance on
>>                large-scale systems with reduced/constant memory footprint
>>              - RDMA Fast Path connections for efficient small
>>                message communication
>>              - Shared Receive Queue (SRQ) with flow control to
>> significantly
>>                reduce memory footprint of the library
>>              - AVL tree-based resource-aware registration cache
>>              - Automatic tuning based on network adapter and host
>> architecture
>>              - Optimized intra-node communication support by taking
>>                advantage of shared-memory communication
>>              - Efficient Buffer Organization for Memory Scalability of
>>                Intra-node Communication
>>              - Automatic intra-node communication parameter tuning
>>                based on platform
>>              - Flexible CPU binding capabilities
>>              - Portable Hardware Locality (hwloc v1.5) support for
>>                defining CPU affinity
>>              - Efficient CPU binding policies (bunch and scatter patterns,
>>                socket and numanode granularities) to specify CPU binding
>>                per job for modern multi-core platforms
>>              - Allow user-defined flexible processor affinity
>>              - Two modes of communication progress
>>                 - Polling
>>                 - Blocking (enables running multiple processes/processor)
>>           - Flexible process manager support
>>              - Support for mpirun rsh, hydra and oshrun process managers
>>
>>       MVAPICH2-X delivers excellent performance. Examples include:
>> OpenSHMEM
>>       Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
>>       intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
>>       platform. More performance numbers can be obtained from the
>> following
>>       URL:
>>
>>         http://mvapich.cse.ohio-state.**edu/performance/mvapich2x/<http://mvapich.cse.ohio-state.edu/performance/mvapich2x/>
>>
>>       New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7
>> (since
>>       OMB 3.6 release) are listed here.
>>
>>       * Features:
>>           - New OpenSHMEM benchmarks
>>              - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
>>                osu_oshm_atomics
>>       * Bug fixes:
>>           - Fix issue with IN_PLACE in osu_gather, osu_scatter and
>>             osu_allgather benchmarks
>>           - Destroy the CUDA context at the end in CUDA supported
>> benchmarks
>>
>>       For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
>>       user guides, quick start guide, and accessing the SVN, please visit
>>       the following URL:
>>
>>         http://mvapich.cse.ohio-state.**edu<http://mvapich.cse.ohio-state.edu>
>>
>>       All questions, feedbacks, bug reports, hints for performance tuning,
>>       patches and enhancements are welcome. Please post it to the
>>       mvapich-discuss mailing list (mvapich-discuss at cse.ohio-**state.edu<mvapich-discuss at cse.ohio-state.edu>
>> ).
>>
>>       Thanks,
>>
>>       The MVAPICH Team
>>       ______________________________**_________________
>>       mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>       To manage subscription options or unsubscribe:
>>       https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>
>>
>>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120909/c6c69a64/attachment.html>