[MPICH] communication between windows and linux

Jarosław Bułat kwant at agh.edu.pl
Fri Dec 21 10:11:03 CST 2007


Hi!

I cannot establish communication between linux and windows. On both side
I have MPICH2-1.0.6p1 with smpd. In order to test communication, I wrote
simple ping-pong like program (two programs, c-sources in attachment).
When they both working either on linux or windows side, everything is
correct. When one program is executed on linux and second on windows,
ring crushes during first message passing. Bellow some debug outputs:

Both programs (ssimple_sender and ssimple_receiver) are working in a
windows box (149.156.197.238), however, they are started from linux box
(mpiexec command are executed in a linux environment - different
computer, different IP). Debug output are mixed from both programs and
appears on linux box. In this case everything is correct.

mpiexec -n 1 -host 149.156.197.238 -path "C:\Documents and Settings
\kwant\Desktop\workspace\ssimple_sender\Debug"
ssimple_sender/Debug/ssimple_sender : -n 1 -host 149.156.197.238 -path
"C:\Documents and Settings\kwant\Desktop\workspace\ssimple_receiver
\Debug" ssimple_receiver
    REC: init_1
SEND: init_1
    REC: init_2
    REC: init_3 numprocs (2): 2
    REC: init_4 myid (1): 1
    REC: loop 1
SEND: init_2
SEND: init_3 numprocs (2): 2
SEND: init_4 myid (0): 0
SEND: loop 1
    REC: loop 2_Irecv
    REC: loop 3_Wait
SEND: loop 2_Irecv
SEND: loop 3_Send, data: 0
SEND: loop 4_Sent
SEND: loop 5_Wait
SEND: loop 6_Rec, data: 0
------ SEND -------- SEND -------- SEND ---------
SEND: loop 1
    REC: loop 4_Send, data: 0
    REC: loop 4_Sent
    REC: loop 1
    REC: loop 2_Irecv
    REC: loop 3_Wait
SEND: loop 2_Irecv
SEND: loop 3_Send, data: 1
SEND: loop 4_Sent
SEND: loop 5_Wait
SEND: loop 6_Rec, data: 1
------ SEND -------- SEND -------- SEND ---------
etc.....


simple_sender is started locally (linux - zm203.zmet.agh.edu.pl) and
simple_receiver remotely (windows - 149.156.197.238). Ring is started
from linux, and they die before first MPI_Send(...) is finished. Below
debug output and exit code:

mpiexec -n 1 -localonly ssimple_sender/Debug/ssimple_sender : -n 1 -host
149.156.197.238 -path "C:\Documents and Settings\kwant\Desktop\workspace
\ssimple_receiver\Debug" -channel sock ssimple_receiver
SEND: init_1
    REC: init_1
SEND: init_2
SEND: init_3 numprocs (2): 2
SEND: init_4 myid (0): 0
SEND: loop 1
    REC: init_2
    REC: init_3 numprocs (2): 2
    REC: init_4 myid (1): 1
    REC: loop 1
    REC: loop 2_Irecv
    REC: loop 3_Wait
SEND: loop 2_Irecv
SEND: loop 3_Send, data: 0

job aborted:
rank: node: exit code[: error message]
0: zm203.zmet.agh.edu.pl: -2
1: 149.156.197.238: 1: Fatal error in MPI_Wait: Other MPI error, error
stack:
MPI_Wait(156)................................:
MPI_Wait(request=003E3DB0, status003E2A08) failed
MPIDI_CH3i_Progress_wait(215)................: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(640)...: 
MPIDI_CH3_Sockconn_handle_connopen_event(887): unable to find the
process group structure with id <>


Both computers are x86_64 boxes, linux is native 64bits, windows is in
32bits version. I'm using MinGW on windows (g++ -v: gcc version 3.4.5
(mingw special)), and gcc version 4.1.2 (Gentoo 4.1.2) on Linux.

Could anyone help me with this broken communication?


Jarek!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ssimple_receiver.cpp
Type: text/x-c++src
Size: 1334 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20071221/79530e0a/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ssimple_sender.cpp
Type: text/x-c++src
Size: 1516 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20071221/79530e0a/attachment-0001.cpp>


More information about the mpich-discuss mailing list