[mpich-discuss] Announcing the Release of MVAPICH2 1.9a, MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
Jed Brown
jedbrown at mcs.anl.gov
Sun Sep 9 10:48:07 CDT 2012
Which version of MPICH2 is this based on? Does it support the nonblocking
collectives in MPICH2-1.5?
On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <panda at cse.ohio-state.edu>
wrote:
> These releases might be of interest to some of the MPICH users. Thus, I am
> posting it here.
>
> Thanks,
>
> DK
>
>
> ---------- Forwarded message ----------
> Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)
> From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
> To: mvapich-discuss at cse.ohio-**state.edu<mvapich-discuss at cse.ohio-state.edu>
> Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>
> Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,
> MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7
>
> The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,
> MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified
> Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.
>
> Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since
> MVAPICH2 1.8GA release) are listed here.
>
> * New Features and Enhancements (since 1.8GA):
> - Support for InfiniBand hardware UD-multicast
> - Scalable UD-multicast-based designs for collectives
> (Bcast, Allreduce and Scatter)
> - Sample Bcast numbers:
> http://mvapich.cse.ohio-state.**edu/performance/mvapich2/coll_**
> multicast.shtml<http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml>
> - Enhanced Bcast and Reduce collectives with pt-to-pt communication
> - LiMIC-based design for Gather collective
> - Improved performance for shared-memory-aware collectives
> - Improved intra-node communication performance with GPU buffers
> using pipelined design
> - Improved inter-node communication performance with GPU buffers
> with non-blocking CUDA copies
> - Improved small message communication performance with
> GPU buffers using CUDA IPC design
> - Improved automatic GPU device selection and CUDA context management
> - Optimal communication channel selection for different
> GPU communication modes (DD, DH and HD) in different
> configurations (intra-IOH and inter-IOH)
> - Removed libibumad dependency for building the library
> - Option for selecting non-default gid-index in a loss-less
> fabric setup in RoCE mode
> - Option to disable signal handler setup
> - Tuned thresholds for various architectures
> - Set DAPL-2.0 as the default version for the uDAPL interface
> - Updated to hwloc v1.5
> - Option to use IP address as a fallback if hostname
> cannot be resolved
> - Improved error reporting
>
> * Bug-Fixes (since 1.8GA):
> - Fix issue in intra-node knomial bcast
> - Handle gethostbyname return values gracefully
> - Fix corner case issue in two-level gather code path
> - Fix bug in CUDA events/streams pool management
> - Fix ptmalloc initialization issue when MALLOC_CHECK_ is
> defined in the environment
> - Thanks to Mehmet Belgin from Georgia Institute of
> Technology for the report
> - Fix memory corruption and handle heterogeneous architectures
> in gather collective
> - Fix issue in detecting the correct HCA type
> - Fix issue in ring start-up to select correct HCA when
> MV2_IBA_HCA is specified
> - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
> - Fix memory corruption on nodes with 64-cores
> - Thanks to M Xie for the report
> - Fix hang in MPI_Finalize with Nemesis interface when
> ptmalloc initialization fails
> - Thanks to Carson Holt from OICR for the report
> - Fix memory corruption in shared memory communication
> - Thanks to Craig Tierney from NOAA for the report
> and testing the patch
> - Fix issue in IB ring start-up selection with mpiexec.hydra
> - Fix issue in selecting CUDA run-time variables when running
> on single node in SMP only mode
> - Fix few memory leaks and warnings
>
> MVAPICH2-X 1.9a software package (released as a technology preview)
> provides support for hybrid MPI+PGAS (OpenSHMEM) programming models
> with unified communication runtime for emerging exascale systems.
> This software package provides flexibility for users to write
> applications using the following programming models with a unified
> communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as
> well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.
>
> Features for MVAPICH2-X 1.9a are as follows:
>
> * MPI Features:
> - MPI-2.2 standard compliance
> - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can
> take advantage of all the features enabled by default
> in OFA-IB-CH3 interface of MVAPICH2 1.9a
> - High performance two-sided communication scalable to
> multi-thousand nodes
> - Optimized collective communication operations
> - Shared-memory optimized algorithms for barrier, broadcast,
> reduce and allreduce operations
> - Optimized two-level designs for scatter and gather operations
> - Improved implementation of allgather, alltoall operations
> - High-performance and scalable support for one-sided communication
> - Direct RDMA based designs for one-sided communication
> - Shared memory backed Windows for One-Sided communication
> - Support for truly passive locking for intra-node RMA
> in shared memory backed windows
> - Multi-threading support
> - Enhanced support for multi-threaded MPI applications
>
> * OpenSHMEM Features:
> - OpenSHMEM v1.0 standard compliance
> - Based on OpenSHMEM reference implementation v1.0c
> - Optimized RDMA-based implementation of OpenSHMEM
> data movement routines
> - Efficient implementation of OpenSHMEM atomics using RDMA atomics
> - High performance intra-node communication using
> shared memory based schemes
>
> * Hybrid Program Features:
> - Supports hybrid programming using MPI and OpenSHMEM
> - Compliance to MPI 2.2 and OpenSHMEM v1.0 standards
> - Optimized network resource utilization through the
> unified communication runtime
> - Efficient deadlock-free progress of MPI and OpenSHMEM calls
>
> * Unified Runtime Features:
> - Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM
> and Hybrid programs benefit from its features listed below:
> - Scalable inter-node communication with highest performance
> and reduced memory usage
> - Integrated RC/XRC design to get best performance on
> large-scale systems with reduced/constant memory footprint
> - RDMA Fast Path connections for efficient small
> message communication
> - Shared Receive Queue (SRQ) with flow control to significantly
> reduce memory footprint of the library
> - AVL tree-based resource-aware registration cache
> - Automatic tuning based on network adapter and host architecture
> - Optimized intra-node communication support by taking
> advantage of shared-memory communication
> - Efficient Buffer Organization for Memory Scalability of
> Intra-node Communication
> - Automatic intra-node communication parameter tuning
> based on platform
> - Flexible CPU binding capabilities
> - Portable Hardware Locality (hwloc v1.5) support for
> defining CPU affinity
> - Efficient CPU binding policies (bunch and scatter patterns,
> socket and numanode granularities) to specify CPU binding
> per job for modern multi-core platforms
> - Allow user-defined flexible processor affinity
> - Two modes of communication progress
> - Polling
> - Blocking (enables running multiple processes/processor)
> - Flexible process manager support
> - Support for mpirun rsh, hydra and oshrun process managers
>
> MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM
> Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put
> intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge
> platform. More performance numbers can be obtained from the following
> URL:
>
> http://mvapich.cse.ohio-state.**edu/performance/mvapich2x/<http://mvapich.cse.ohio-state.edu/performance/mvapich2x/>
>
> New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since
> OMB 3.6 release) are listed here.
>
> * Features:
> - New OpenSHMEM benchmarks
> - osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and
> osu_oshm_atomics
> * Bug fixes:
> - Fix issue with IN_PLACE in osu_gather, osu_scatter and
> osu_allgather benchmarks
> - Destroy the CUDA context at the end in CUDA supported benchmarks
>
> For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated
> user guides, quick start guide, and accessing the SVN, please visit
> the following URL:
>
> http://mvapich.cse.ohio-state.**edu <http://mvapich.cse.ohio-state.edu>
>
> All questions, feedbacks, bug reports, hints for performance tuning,
> patches and enhancements are welcome. Please post it to the
> mvapich-discuss mailing list (mvapich-discuss at cse.ohio-**state.edu<mvapich-discuss at cse.ohio-state.edu>
> ).
>
> Thanks,
>
> The MVAPICH Team
> ______________________________**_________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120909/d6607be7/attachment.html>
More information about the mpich-discuss
mailing list