<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE>RE: [mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2> Hi,<BR>
Any inputs on the other points that I mentioned in my prev email ?<BR>
<BR>
Regards,<BR>
Jayesh<BR>
<BR>
-----Original Message-----<BR>
From: Calin Iaru [<A HREF="mailto:calin@dolphinics.com">mailto:calin@dolphinics.com</A>]<BR>
Sent: Friday, May 30, 2008 8:17 AM<BR>
To: Jayesh Krishna<BR>
Cc: mpich-discuss@mcs.anl.gov<BR>
Subject: Re: [mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce<BR>
<BR>
Hi Jayesh,<BR>
<BR>
besides Allreduce, there is Reduce and Reduce_Scatter that fails.<BR>
<BR>
Best regards,<BR>
Calin<BR>
<BR>
Jayesh Krishna wrote:<BR>
><BR>
> Hi,<BR>
> I tried running the IMB 3.1 suite for allreduce on a single machine<BR>
> with upto 8 procs and did not get any errors.<BR>
> <BR>
> 1) Make sure that both node-1 & node-2 have the same data model (data<BR>
> type representation). Please note that MPICH2 currently does not<BR>
> support heterogeneous systems (wrt the data models used by the<BR>
> machines, for eg: you cannot run MPI procs across x86 and x64<BR>
> machines). If you need to run your program across a heterogeneous<BR>
> system please use MPICH1 instead.<BR>
><BR>
> 2) Try running the benchmark on a single node/host (mpiexec -n 2<BR>
> imb-mpi1.exe allreduce) and let us know the results.<BR>
> 3) Are you able to run other tests in the IMB 3.1 suite ?<BR>
><BR>
> Regards,<BR>
> Jayesh<BR>
><BR>
> -----Original Message-----<BR>
> From: owner-mpich-discuss@mcs.anl.gov<BR>
> [<A HREF="mailto:owner-mpich-discuss@mcs.anl.gov">mailto:owner-mpich-discuss@mcs.anl.gov</A>] On Behalf Of Calin Iaru<BR>
> Sent: Monday, May 26, 2008 5:50 AM<BR>
> To: mpich-discuss@mcs.anl.gov<BR>
> Subject: [mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce<BR>
><BR>
> The problem is that the latest mpich2 in combination with IMB 3.1<BR>
> generates a data corruption error when running on 2 nodes. IMB was<BR>
> compiled with the CHECK flag and TOL set to 0 inside IMB_declare.h. I<BR>
> am not sure if this is a transport error or a verification error; it<BR>
> could be that the problem lies in the application code.<BR>
><BR>
> E:\Program Files\MPICH2\bin>mpiexec.exe -hosts 2 node-1 node-2<BR>
> \\node-1\e$\imb-mpi1.exe allreduce<BR>
> #---------------------------------------------------<BR>
> # Intel (R) MPI Benchmark Suite V3.1, MPI-1 part<BR>
> #---------------------------------------------------<BR>
> # Date : Fri May 23 14:44:12 2008<BR>
> # Machine : x86 Family 15 Model 4 Stepping 1, GenuineIntel<BR>
> # System : Windows 2003<BR>
> # Release : 5.2.3790<BR>
> # Version : Service Pack 1<BR>
> # MPI Version : 2.0<BR>
> # MPI Thread Environment: MPI_THREAD_SINGLE<BR>
><BR>
><BR>
><BR>
> # Calling sequence was:<BR>
><BR>
> # \\node-1\e$\imb-mpi1.exe allreduce<BR>
><BR>
> # Minimum message length in bytes: 0<BR>
> # Maximum message length in bytes: 4194304<BR>
> #<BR>
> # MPI_Datatype : MPI_BYTE<BR>
> # MPI_Datatype for reductions : MPI_FLOAT<BR>
> # MPI_Op : MPI_SUM<BR>
> #<BR>
> #<BR>
><BR>
> # List of Benchmarks to run:<BR>
><BR>
> # Allreduce<BR>
><BR>
> #---------------------------------------------------------------------<BR>
> --------<BR>
> # Benchmarking Allreduce<BR>
> # #processes = 2<BR>
> #-----------------------------------------------------------------------------<BR>
> #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] <BR>
> defects<BR>
> 0 1000 0.51 0.52 <BR>
> 0.51 0.00<BR>
> 4 1000 80.30 80.35 <BR>
> 80.33 0.00<BR>
> 1: Error Allreduce, size = 8, sample #0 Process 1: Got invalid buffer:<BR>
> Buffer entry: 2.300000<BR>
> 0: Error Allreduce, size = 8, sample #0 Process 0: Got invalid buffer:<BR>
> Buffer entry: 2.300000<BR>
><BR>
><BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>