<p>Thanks, missed that on my phone.</p>
<p>I'm looking forward to a performant NBC implementation on IB. Open MPI trunk has an implementation based on libNBC, but my testing shows it slowing down point-to-point far more than MPICH2's NBC implementation.</p>
<div class="gmail_quote">On Sep 9, 2012 11:52 AM, "Evren Yurtesen IB" <<a href="mailto:eyurtese@abo.fi">eyurtese@abo.fi</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
>From the download page:<br>
<br>
<a href="http://mvapich.cse.ohio-state.edu/download/mvapich2/" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/download/mvapich2/</a><br>
MVAPICH2 1.9a is available as a single integrated package (with MPICH2 1.4.1p1) for download.<br>
<br>
On Sun, 9 Sep 2012, Jed Brown wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Which version of MPICH2 is this based on? Does it support the nonblocking collectives in MPICH2-1.5?<br>
<br>
On Sep 9, 2012 10:22 AM, "Dhabaleswar Panda" <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>> wrote:<br>
These releases might be of interest to some of the MPICH users. Thus, I am posting it here.<br>
<br>
Thanks,<br>
<br>
DK<br>
<br>
<br>
---------- Forwarded message ----------<br>
Date: Sat, 8 Sep 2012 22:58:20 -0400 (EDT)<br>
From: Dhabaleswar Panda <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>><br>
To: <a href="mailto:mvapich-discuss@cse.ohio-state.edu" target="_blank">mvapich-discuss@cse.ohio-<u></u>state.edu</a><br>
Cc: Dhabaleswar Panda <<a href="mailto:panda@cse.ohio-state.edu" target="_blank">panda@cse.ohio-state.edu</a>><br>
Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.9a,<br>
MVAPICH2-X 1.9a and OSU Micro-Benchmarks (OMB) 3.7<br>
<br>
The MVAPICH team is pleased to announce the release of MVAPICH2 1.9a,<br>
MVAPICH2-X 1.9a (Hybrid MPI+PGAS (OpenSHMEM) with Unified<br>
Communication Runtime) and OSU Micro-Benchmarks (OMB) 3.7.<br>
<br>
Features, Enhancements, and Bug Fixes for MVAPICH2 1.9a (since<br>
MVAPICH2 1.8GA release) are listed here.<br>
<br>
* New Features and Enhancements (since 1.8GA):<br>
- Support for InfiniBand hardware UD-multicast<br>
- Scalable UD-multicast-based designs for collectives<br>
(Bcast, Allreduce and Scatter)<br>
- Sample Bcast numbers:<br>
<a href="http://mvapich.cse.ohio-state.edu/performance/mvapich2/coll_multicast.shtml" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/performance/mvapich2/coll_<u></u>multicast.shtml</a><br>
- Enhanced Bcast and Reduce collectives with pt-to-pt communication<br>
- LiMIC-based design for Gather collective<br>
- Improved performance for shared-memory-aware collectives<br>
- Improved intra-node communication performance with GPU buffers<br>
using pipelined design<br>
- Improved inter-node communication performance with GPU buffers<br>
with non-blocking CUDA copies<br>
- Improved small message communication performance with<br>
GPU buffers using CUDA IPC design<br>
- Improved automatic GPU device selection and CUDA context management<br>
- Optimal communication channel selection for different<br>
GPU communication modes (DD, DH and HD) in different<br>
configurations (intra-IOH and inter-IOH)<br>
- Removed libibumad dependency for building the library<br>
- Option for selecting non-default gid-index in a loss-less<br>
fabric setup in RoCE mode<br>
- Option to disable signal handler setup<br>
- Tuned thresholds for various architectures<br>
- Set DAPL-2.0 as the default version for the uDAPL interface<br>
- Updated to hwloc v1.5<br>
- Option to use IP address as a fallback if hostname<br>
cannot be resolved<br>
- Improved error reporting<br>
<br>
* Bug-Fixes (since 1.8GA):<br>
- Fix issue in intra-node knomial bcast<br>
- Handle gethostbyname return values gracefully<br>
- Fix corner case issue in two-level gather code path<br>
- Fix bug in CUDA events/streams pool management<br>
- Fix ptmalloc initialization issue when MALLOC_CHECK_ is<br>
defined in the environment<br>
- Thanks to Mehmet Belgin from Georgia Institute of<br>
Technology for the report<br>
- Fix memory corruption and handle heterogeneous architectures<br>
in gather collective<br>
- Fix issue in detecting the correct HCA type<br>
- Fix issue in ring start-up to select correct HCA when<br>
MV2_IBA_HCA is specified<br>
- Fix SEGFAULT in MPI_Finalize when IB loop-back is used<br>
- Fix memory corruption on nodes with 64-cores<br>
- Thanks to M Xie for the report<br>
- Fix hang in MPI_Finalize with Nemesis interface when<br>
ptmalloc initialization fails<br>
- Thanks to Carson Holt from OICR for the report<br>
- Fix memory corruption in shared memory communication<br>
- Thanks to Craig Tierney from NOAA for the report<br>
and testing the patch<br>
- Fix issue in IB ring start-up selection with mpiexec.hydra<br>
- Fix issue in selecting CUDA run-time variables when running<br>
on single node in SMP only mode<br>
- Fix few memory leaks and warnings<br>
<br>
MVAPICH2-X 1.9a software package (released as a technology preview)<br>
provides support for hybrid MPI+PGAS (OpenSHMEM) programming models<br>
with unified communication runtime for emerging exascale systems.<br>
This software package provides flexibility for users to write<br>
applications using the following programming models with a unified<br>
communication runtime: MPI, MPI+OpenMP, PGAS (OpenSHMEM) programs as<br>
well as hybrid MPI(+OpenMP) + PGAS (OpenSHMEM) programs.<br>
<br>
Features for MVAPICH2-X 1.9a are as follows:<br>
<br>
* MPI Features:<br>
- MPI-2.2 standard compliance<br>
- Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI programs can<br>
take advantage of all the features enabled by default<br>
in OFA-IB-CH3 interface of MVAPICH2 1.9a<br>
- High performance two-sided communication scalable to<br>
multi-thousand nodes<br>
- Optimized collective communication operations<br>
- Shared-memory optimized algorithms for barrier, broadcast,<br>
reduce and allreduce operations<br>
- Optimized two-level designs for scatter and gather operations<br>
- Improved implementation of allgather, alltoall operations<br>
- High-performance and scalable support for one-sided communication<br>
- Direct RDMA based designs for one-sided communication<br>
- Shared memory backed Windows for One-Sided communication<br>
- Support for truly passive locking for intra-node RMA<br>
in shared memory backed windows<br>
- Multi-threading support<br>
- Enhanced support for multi-threaded MPI applications<br>
<br>
* OpenSHMEM Features:<br>
- OpenSHMEM v1.0 standard compliance<br>
- Based on OpenSHMEM reference implementation v1.0c<br>
- Optimized RDMA-based implementation of OpenSHMEM<br>
data movement routines<br>
- Efficient implementation of OpenSHMEM atomics using RDMA atomics<br>
- High performance intra-node communication using<br>
shared memory based schemes<br>
<br>
* Hybrid Program Features:<br>
- Supports hybrid programming using MPI and OpenSHMEM<br>
- Compliance to MPI 2.2 and OpenSHMEM v1.0 standards<br>
- Optimized network resource utilization through the<br>
unified communication runtime<br>
- Efficient deadlock-free progress of MPI and OpenSHMEM calls<br>
<br>
* Unified Runtime Features:<br>
- Based on MVAPICH2 1.9a (OFA-IB-CH3 interface). MPI, OpenSHMEM<br>
and Hybrid programs benefit from its features listed below:<br>
- Scalable inter-node communication with highest performance<br>
and reduced memory usage<br>
- Integrated RC/XRC design to get best performance on<br>
large-scale systems with reduced/constant memory footprint<br>
- RDMA Fast Path connections for efficient small<br>
message communication<br>
- Shared Receive Queue (SRQ) with flow control to significantly<br>
reduce memory footprint of the library<br>
- AVL tree-based resource-aware registration cache<br>
- Automatic tuning based on network adapter and host architecture<br>
- Optimized intra-node communication support by taking<br>
advantage of shared-memory communication<br>
- Efficient Buffer Organization for Memory Scalability of<br>
Intra-node Communication<br>
- Automatic intra-node communication parameter tuning<br>
based on platform<br>
- Flexible CPU binding capabilities<br>
- Portable Hardware Locality (hwloc v1.5) support for<br>
defining CPU affinity<br>
- Efficient CPU binding policies (bunch and scatter patterns,<br>
socket and numanode granularities) to specify CPU binding<br>
per job for modern multi-core platforms<br>
- Allow user-defined flexible processor affinity<br>
- Two modes of communication progress<br>
- Polling<br>
- Blocking (enables running multiple processes/processor)<br>
- Flexible process manager support<br>
- Support for mpirun rsh, hydra and oshrun process managers<br>
<br>
MVAPICH2-X delivers excellent performance. Examples include: OpenSHMEM<br>
Put inter-node latency of 1.4 microsec (4 bytes) on IB-FDR and Put<br>
intra-node latency of 0.18 microsec (4 bytes) on Intel SandyBridge<br>
platform. More performance numbers can be obtained from the following<br>
URL:<br>
<br>
<a href="http://mvapich.cse.ohio-state.edu/performance/mvapich2x/" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu/performance/mvapich2x/</a><br>
<br>
New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.7 (since<br>
OMB 3.6 release) are listed here.<br>
<br>
* Features:<br>
- New OpenSHMEM benchmarks<br>
- osu_oshm_put, osu_oshm_get, osu_oshm_put_mr and<br>
osu_oshm_atomics<br>
* Bug fixes:<br>
- Fix issue with IN_PLACE in osu_gather, osu_scatter and<br>
osu_allgather benchmarks<br>
- Destroy the CUDA context at the end in CUDA supported benchmarks<br>
<br>
For downloading MVAPICH2 1.9a, MVAPICH2-X 1.9a, OMB 3.7, associated<br>
user guides, quick start guide, and accessing the SVN, please visit<br>
the following URL:<br>
<br>
<a href="http://mvapich.cse.ohio-state.edu" target="_blank">http://mvapich.cse.ohio-state.<u></u>edu</a><br>
<br>
All questions, feedbacks, bug reports, hints for performance tuning,<br>
patches and enhancements are welcome. Please post it to the<br>
mvapich-discuss mailing list (<a href="mailto:mvapich-discuss@cse.ohio-state.edu" target="_blank">mvapich-discuss@cse.ohio-<u></u>state.edu</a>).<br>
<br>
Thanks,<br>
<br>
The MVAPICH Team<br>
______________________________<u></u>_________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>
<br>
<br>
</blockquote>
<br>_______________________________________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br></blockquote></div>