[mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce

Jayesh Krishna jayesh at mcs.anl.gov
Fri May 30 09:41:36 CDT 2008


 Hi,
  Any inputs on the other points that I mentioned in my prev email ?

Regards,
Jayesh

-----Original Message-----
From: Calin Iaru [mailto:calin at dolphinics.com] 
Sent: Friday, May 30, 2008 8:17 AM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce

Hi Jayesh,

    besides Allreduce, there is Reduce and Reduce_Scatter that fails.

Best regards,
    Calin

Jayesh Krishna wrote:
>
>  Hi,
>   I tried running the IMB 3.1 suite for allreduce on a single machine 
> with upto 8 procs and did not get any errors.
>  
> 1) Make sure that both node-1 & node-2 have the same data model (data 
> type representation). Please note that MPICH2 currently does not 
> support heterogeneous systems (wrt the data models used by the 
> machines, for eg: you cannot run MPI procs across x86 and x64 
> machines). If you need to run your program across a heterogeneous 
> system please use MPICH1 instead.
>
> 2) Try running the benchmark on a single node/host (mpiexec -n 2 
> imb-mpi1.exe allreduce) and let us know the results.
> 3) Are you able to run other tests in the IMB 3.1 suite ?
>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Calin Iaru
> Sent: Monday, May 26, 2008 5:50 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce
>
> The problem is that the latest mpich2 in combination with IMB 3.1 
> generates a data corruption error when running on 2 nodes. IMB was 
> compiled with the CHECK flag and TOL set to 0 inside IMB_declare.h. I 
> am not sure if this is a transport error or a verification error; it 
> could be that the problem lies in the application code.
>
> E:\Program Files\MPICH2\bin>mpiexec.exe -hosts 2 node-1 node-2 
> \\node-1\e$\imb-mpi1.exe allreduce
> #---------------------------------------------------
> #    Intel (R) MPI Benchmark Suite V3.1, MPI-1 part
> #---------------------------------------------------
> # Date                  : Fri May 23 14:44:12 2008
> # Machine               : x86 Family 15 Model 4 Stepping 1, GenuineIntel
> # System                : Windows 2003
> # Release               : 5.2.3790
> # Version               : Service Pack 1
> # MPI Version           : 2.0
> # MPI Thread Environment: MPI_THREAD_SINGLE
>
>
>
> # Calling sequence was:
>
> # \\node-1\e$\imb-mpi1.exe allreduce
>
> # Minimum message length in bytes:   0
> # Maximum message length in bytes:   4194304
> #
> # MPI_Datatype                   :   MPI_BYTE
> # MPI_Datatype for reductions    :   MPI_FLOAT
> # MPI_Op                         :   MPI_SUM
> #
> #
>
> # List of Benchmarks to run:
>
> # Allreduce
>
> #---------------------------------------------------------------------
> --------
> # Benchmarking Allreduce
> # #processes = 2
>
#-------------------------------------------------------------------------
----
>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]     
> defects
>             0         1000         0.51         0.52        
> 0.51         0.00
>             4         1000        80.30        80.35       
> 80.33         0.00
> 1: Error Allreduce, size = 8, sample #0 Process 1: Got invalid buffer:
> Buffer entry: 2.300000
> 0: Error Allreduce, size = 8, sample #0 Process 0: Got invalid buffer:
> Buffer entry: 2.300000
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080530/2127151b/attachment.htm>


More information about the mpich-discuss mailing list