[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
Dhabaleswar Panda
panda at cse.ohio-state.edu
Sun Sep 9 11:07:02 CDT 2012
MVAPICH2 releases during the next few months will be based on MPICH2-1.5.
Thanks,
DK
On Sun, 9 Sep 2012, Evren Yurtesen IB wrote:
>> From the download page:
>
> http://mvapich.cse.ohio-state.edu/download/mvapich2/
> MVAPICH2 1.9a is available as a single integrated package (with MPICH2
> 1.4.1p1) for download.
>
> On Sun, 9 Sep 2012, Jed Brown wrote:
>
>>
>> Which version of MPICH2 is this based on? Does it support the nonblocking
>> collectives in MPICH2-1.5?
>>
>> On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <panda at cse.ohio-state.edu>
>> wrote:
>> These releases might be of interest to some of the MPICH users. Thus,
>> I am posting it here.
>>
>> Thanks,
>>
>> DK
>>
>>
>> ---------- Forwarded message ----------
>> Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
>> From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>> To: mvapich-discuss at cse.ohio-state.edu
>> Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>> Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
>> MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
>>
>> The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,
>> MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
>> Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.
>>
>> Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
>> MVAPICH2 1.8GA release) are listed here.
>>
>> * New Features and Enhancements (since 1.8GA):
>> - Support for InfiniBand hardware UD-multicast
>> - Scalable UD-multicast-based designs for collectives
>> (Bcast, Allreduce and Scatter)
>> - Sample Bcast numbers:
>> http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml
>> - Enhanced Bcast and Reduce collectives with pt-to-pt
>> communication
>> - LiMIC-based design for Gather collective
>> - Improved performance for shared-memory-aware collectives
>> - Improved intra-node communication performance with GPU buffers
>> using pipelined design
>> - Improved inter-node communication performance with GPU buffers
>> with non-blocking CUDA copies
>> - Improved small message communication performance with
>> GPU buffers using CUDA IPC design
>> - Improved automatic GPU device selection and CUDA context
>> management
>> - Optimal communication channel selection for different
>> GPU communication modes (DD, DH and HD) in different
>> configurations (intra-IOH and inter-IOH)
>> - Removed libibumad dependency for building the library
>> - Option for selecting non-default gid-index in a loss-less
>> fabric setup in RoCE mode
>> - Option to disable signal handler setup
>> - Tuned thresholds for various architectures
>> - Set DAPL-2.0 as the default version for the uDAPL interface
>> - Updated to hwloc v1.5
>> - Option to use IP address as a fallback if hostname
>> cannot be resolved
>> - Improved error reporting
>>
>> * Bug-Fixes (since 1.8GA):
>> - Fix issue in intra-node knomial bcast
>> - Handle gethostbyname return values gracefully
>> - Fix corner case issue in two-level gather code path
>> - Fix bug in CUDA events/streams pool management
>> - Fix ptmalloc initialization issue when MALLOC_CHECK_ is
>> defined in the environment
>> - Thanks to Mehmet Belgin from Georgia Institute of
>> Technology for the report
>> - Fix memory corruption and handle heterogeneous architectures
>> in gather collective
>> - Fix issue in detecting the correct HCA type
>> - Fix issue in ring start-up to select correct HCA when
>> MV2_IBA_HCA is specified
>> - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
>> - Fix memory corruption on nodes with 64-cores
>> - Thanks to M Xie for the report
>> - Fix hang in MPI_Finalize with Nemesis interface when
>> ptmalloc initialization fails
>> - Thanks to Carson Holt from OICR for the report
>> - Fix memory corruption in shared memory communication
>> - Thanks to Craig Tierney from NOAA for the report
>> and testing the patch
>> - Fix issue in IB ring start-up selection with mpiexec.hydra
>> - Fix issue in selecting CUDA run-time variables when running
>> on single node in SMP only mode
>> - Fix few memory leaks and warnings
>>
>> MVAPICH2-X 1.9a software package (released as a technology preview)
>> provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
>> with unified communication runtime for emerging exascale systems.
>> This software package provides flexibility for users to write
>> applications using the following programming models with a unified
>> communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
>> well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.
>>
>> Features for MVAPICH2-X 1.9a are as follows:
>>
>> * MPI Features:
>> - MPI-2.2 standard compliance
>> - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can
>> take advantage of all the features enabled by default
>> in OFA-IB-CH3 interface of MVAPICH2 1.9a
>> - High performance two-sided communication scalable to
>> multi-thousand nodes
>> - Optimized collective communication operations
>> - Shared-memory optimized algorithms for barrier, broadcast,
>> reduce and allreduce operations
>> - Optimized two-level designs for scatter and gather operations
>> - Improved implementation of allgather, alltoall operations
>> - High-performance and scalable support for one-sided
>> communication
>> - Direct RDMA based designs for one-sided communication
>> - Shared memory backed Windows for One-Sided communication
>> - Support for truly passive locking for intra-node RMA
>> in shared memory backed windows
>> - Multi-threading support
>> - Enhanced support for multi-threaded MPI applications
>>
>> * OpenSHMEM Features:
>> - OpenSHMEM v1.0 standard compliance
>> - Based on OpenSHMEM reference implementation v1.0c
>> - Optimized RDMA-based implementation of OpenSHMEM
>> data movement routines
>> - Efficient implementation of OpenSHMEM atomics using RDMA
>> atomics
>> - High performance intra-node communication using
>> shared memory based schemes
>>
>> * Hybrid Program Features:
>> - Supports hybrid programming using MPI and OpenSHMEM
>> - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
>> - Optimized network resource utilization through the
>> unified communication runtime
>> - Efficient deadlock-free progress of MPI and OpenSHMEM calls
>>
>> * Unified Runtime Features:
>> - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
>> and Hybrid programs benefit from its features listed below:
>> - Scalable inter-node communication with highest performance
>> and reduced memory usage
>> - Integrated RC/XRC design to get best performance on
>> large-scale systems with reduced/constant memory footprint
>> - RDMA Fast Path connections for efficient small
>> message communication
>> - Shared Receive Queue (SRQ) with flow control to
>> significantly
>> reduce memory footprint of the library
>> - AVL tree-based resource-aware registration cache
>> - Automatic tuning based on network adapter and host
>> architecture
>> - Optimized intra-node communication support by taking
>> advantage of shared-memory communication
>> - Efficient Buffer Organization for Memory Scalability of
>> Intra-node Communication
>> - Automatic intra-node communication parameter tuning
>> based on platform
>> - Flexible CPU binding capabilities
>> - Portable Hardware Locality (hwloc v1.5) support for
>> defining CPU affinity
>> - Efficient CPU binding policies (bunch and scatter patterns,
>> socket and numanode granularities) to specify CPU binding
>> per job for modern multi-core platforms
>> - Allow user-defined flexible processor affinity
>> - Two modes of communication progress
>> - Polling
>> - Blocking (enables running multiple processes/processor)
>> - Flexible process manager support
>> - Support for mpirun rsh, hydra and oshrun process managers
>>
>> MVAPICH2-X delivers excellent performance. Examples include:
>> OpenSHMEM
>> Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
>> intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
>> platform. More performance numbers can be obtained from the following
>> URL:
>>
>> http://mvapich.cse.ohio-state.edu/performance/mvapich2x/
>>
>> New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7
>> (since
>> OMB 3.6 release) are listed here.
>>
>> * Features:
>> - New OpenSHMEM benchmarks
>> - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
>> osu_oshm_atomics
>> * Bug fixes:
>> - Fix issue with IN_PLACE in osu_gather, osu_scatter and
>> osu_allgather benchmarks
>> - Destroy the CUDA context at the end in CUDA supported
>> benchmarks
>>
>> For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
>> user guides, quick start guide, and accessing the SVN, please visit
>> the following URL:
>>
>> http://mvapich.cse.ohio-state.edu
>>
>> All questions, feedbacks, bug reports, hints for performance tuning,
>> patches and enhancements are welcome. Please post it to the
>> mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
>>
>> Thanks,
>>
>> The MVAPICH Team
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>
More information about the mpich-discuss
mailing list