[mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce

Jayesh Krishna jayesh at mcs.anl.gov
Tue May 27 12:23:24 CDT 2008


 Hi,
  I tried running the IMB 3.1 suite for allreduce on a single machine with
upto 8 procs and did not get any errors.
  
1) Make sure that both node-1 & node-2 have the same data model (data type
representation). Please note that MPICH2 currently does not support
heterogeneous systems (wrt the data models used by the machines, for eg:
you cannot run MPI procs across x86 and x64 machines). If you need to run
your program across a heterogeneous system please use MPICH1 instead. 

2) Try running the benchmark on a single node/host (mpiexec -n 2
imb-mpi1.exe allreduce) and let us know the results.
3) Are you able to run other tests in the IMB 3.1 suite ?

Regards,
Jayesh

-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Calin Iaru
Sent: Monday, May 26, 2008 5:50 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] IMB 3.1 with TOL 0 crashes on Allreduce

The problem is that the latest mpich2 in combination with IMB 3.1
generates a data corruption error when running on 2 nodes. IMB was
compiled with the CHECK flag and TOL set to 0 inside IMB_declare.h. I am
not sure if this is a transport error or a verification error; it could be
that the problem lies in the application code.

E:\Program Files\MPICH2\bin>mpiexec.exe -hosts 2 node-1 node-2
\\node-1\e$\imb-mpi1.exe allreduce
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.1, MPI-1 part
#---------------------------------------------------
# Date                  : Fri May 23 14:44:12 2008
# Machine               : x86 Family 15 Model 4 Stepping 1, GenuineIntel
# System                : Windows 2003
# Release               : 5.2.3790
# Version               : Service Pack 1
# MPI Version           : 2.0
# MPI Thread Environment: MPI_THREAD_SINGLE



# Calling sequence was:

# \\node-1\e$\imb-mpi1.exe allreduce

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Allreduce

#-------------------------------------------------------------------------
----
# Benchmarking Allreduce
# #processes = 2
#-------------------------------------------------------------------------
----
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]      
defects
            0         1000         0.51         0.52         
0.51         0.00
            4         1000        80.30        80.35        
80.33         0.00
1: Error Allreduce, size = 8, sample #0
Process 1: Got invalid buffer:
Buffer entry: 2.300000
0: Error Allreduce, size = 8, sample #0
Process 0: Got invalid buffer:
Buffer entry: 2.300000


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080527/e744b504/attachment.htm>


More information about the mpich-discuss mailing list