[MPICH] communication between windows and linux

Jayesh Krishna jayesh at mcs.anl.gov
Fri Dec 21 10:54:39 CST 2007


 Hi,
  The only process manager available on MPICH2 for windows is SMPD. You
should configure MPICH2 on the linux machine to use SMPD (./configure ...
--with-pm=smpd ...).
  Also note that we currently don't have support for heterogeneous machines
in MPICH2 (You cannot run your program across 32-bit and 64-bit machines).
Heterogeneous support is slated for a future release (sometime next year).

(PS: The default process manager on the linux side is MPD.)

Regards,
Jayesh

-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jaroslaw Bulat
Sent: Friday, December 21, 2007 10:11 AM
To: mpich2
Subject: [MPICH] communication between windows and linux

Hi!

I cannot establish communication between linux and windows. On both side I
have MPICH2-1.0.6p1 with smpd. In order to test communication, I wrote
simple ping-pong like program (two programs, c-sources in attachment).
When they both working either on linux or windows side, everything is
correct. When one program is executed on linux and second on windows, ring
crushes during first message passing. Bellow some debug outputs:

Both programs (ssimple_sender and ssimple_receiver) are working in a windows
box (149.156.197.238), however, they are started from linux box (mpiexec
command are executed in a linux environment - different computer, different
IP). Debug output are mixed from both programs and appears on linux box. In
this case everything is correct.

mpiexec -n 1 -host 149.156.197.238 -path "C:\Documents and Settings
\kwant\Desktop\workspace\ssimple_sender\Debug"
ssimple_sender/Debug/ssimple_sender : -n 1 -host 149.156.197.238 -path
"C:\Documents and Settings\kwant\Desktop\workspace\ssimple_receiver
\Debug" ssimple_receiver
    REC: init_1
SEND: init_1
    REC: init_2
    REC: init_3 numprocs (2): 2
    REC: init_4 myid (1): 1
    REC: loop 1
SEND: init_2
SEND: init_3 numprocs (2): 2
SEND: init_4 myid (0): 0
SEND: loop 1
    REC: loop 2_Irecv
    REC: loop 3_Wait
SEND: loop 2_Irecv
SEND: loop 3_Send, data: 0
SEND: loop 4_Sent
SEND: loop 5_Wait
SEND: loop 6_Rec, data: 0
------ SEND -------- SEND -------- SEND ---------
SEND: loop 1
    REC: loop 4_Send, data: 0
    REC: loop 4_Sent
    REC: loop 1
    REC: loop 2_Irecv
    REC: loop 3_Wait
SEND: loop 2_Irecv
SEND: loop 3_Send, data: 1
SEND: loop 4_Sent
SEND: loop 5_Wait
SEND: loop 6_Rec, data: 1
------ SEND -------- SEND -------- SEND --------- etc.....


simple_sender is started locally (linux - zm203.zmet.agh.edu.pl) and
simple_receiver remotely (windows - 149.156.197.238). Ring is started from
linux, and they die before first MPI_Send(...) is finished. Below debug
output and exit code:

mpiexec -n 1 -localonly ssimple_sender/Debug/ssimple_sender : -n 1 -host
149.156.197.238 -path "C:\Documents and Settings\kwant\Desktop\workspace
\ssimple_receiver\Debug" -channel sock ssimple_receiver
SEND: init_1
    REC: init_1
SEND: init_2
SEND: init_3 numprocs (2): 2
SEND: init_4 myid (0): 0
SEND: loop 1
    REC: init_2
    REC: init_3 numprocs (2): 2
    REC: init_4 myid (1): 1
    REC: loop 1
    REC: loop 2_Irecv
    REC: loop 3_Wait
SEND: loop 2_Irecv
SEND: loop 3_Send, data: 0

job aborted:
rank: node: exit code[: error message]
0: zm203.zmet.agh.edu.pl: -2
1: 149.156.197.238: 1: Fatal error in MPI_Wait: Other MPI error, error
stack:
MPI_Wait(156)................................:
MPI_Wait(request=003E3DB0, status003E2A08) failed
MPIDI_CH3i_Progress_wait(215)................: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(640)...: 
MPIDI_CH3_Sockconn_handle_connopen_event(887): unable to find the process
group structure with id <>


Both computers are x86_64 boxes, linux is native 64bits, windows is in
32bits version. I'm using MinGW on windows (g++ -v: gcc version 3.4.5 (mingw
special)), and gcc version 4.1.2 (Gentoo 4.1.2) on Linux.

Could anyone help me with this broken communication?


Jarek!





More information about the mpich-discuss mailing list