[mpich-discuss] MPI_Allreduce fail. Please help.

Yonghui lyh03259.aps at gmail.com
Tue Sep 4 14:12:22 CDT 2012


Dear MPICH2 users and developers,

 

I am recently learning MPI and reading the source code of a MPI implemented
open software. One of the functions which does something strange is
MPI_Allreduce.

 

I am working on Windows 7 pro 64bit machine with MinGW (gcc version 4.6.2,
32bit) and using MPICH2 1.4.1-p1 32bit version downloaded from the MPICH2
site. The code can be compiled without any problem, but however it failed
when running (maybe invalid memory access?). There must be some problem with
the windows version MPI_Allreduce since it works fine if I remove that line.
And it also works if I make the matrix smaller. I tried it on a Ubuntu
machine with same version MPI as well. No problem in Linux.

 

To make the question clear, I added MPI_Allreduce into a hello world code.
The code is written in F90. I haven't test the c version of it but I think
they should be very similar (differed by the function name and the error
parameter).

 

Here is the command that I used to compile:

gfortran hello1.f90 -g -o hello.exe -IC:\MPICH2_x86\include
-LC:\MPICH2_x86\lib -lfmpich2g

 

Here is the source code:

!------------------code begin----------------------

program main

  include 'mpif.h'

  character * (MPI_MAX_PROCESSOR_NAME) processor_name

  integer myid, numprocs, namelen, rc,ierr

  integer, allocatable :: mat1(:, :, :)

 

  call MPI_INIT( ierr )

  call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )

  call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

  call MPI_GET_PROCESSOR_NAME(processor_name, namelen, ierr)

 

  allocate(mat1(-36:36, -36:36, -36:36))

  mat1(:,:,:) = 0

  call MPI_Bcast( mat1(-36, -36, -36), 389017, MPI_INT, 0, MPI_COMM_WORLD,
ierr )

  call MPI_Allreduce(MPI_IN_PLACE, mat1(-36, -36, -36), 389017, MPI_INTEGER,
MPI_BOR, MPI_COMM_WORLD, ierr)

  print *,"MPI_Allreduce done!!!"

  print *,"Hello World! Process ", myid, " of ", numprocs, " on ",
processor_name

  call MPI_FINALIZE(rc)

  end

!------------------code end----------------------

 

When I use gdb (comes with MinGW) to check (gdb hello.exe then backtrace). I
got something meaningless (or seems to be for myself):

 

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 16316.0x4fd0]

0x01c03100 in mpich2nemesis!PMPI_Wtime ()

   from C:\Windows\system32\mpich2nemesis.dll

(gdb) backtrace

#0  0x01c03100 in mpich2nemesis!PMPI_Wtime ()

   from C:\Windows\system32\mpich2nemesis.dll

#1  0x0017be00 in ?? ()

#2  0x00000000 in ?? ()

 

Does this actually mean there are something wrong with the windows version
MPI library?

What will be the solution to make it work?

 

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120904/062277ae/attachment.html>


More information about the mpich-discuss mailing list