[MPICH] communication between windows and linux
Jarosław Bułat
kwant at agh.edu.pl
Fri Dec 21 11:50:06 CST 2007
Hi!
On both side I have started smpd (on linux compiled with
--with-pm=smpd). I try homogenus config. Should I chose 64bit or 32bit
configuration (linux and windows)? First of all, is it working and
usable MPICH2 on 64bit Windows (XP or Vista) or should I use 32bit
linux?
When do you expect heterogeneous MPICH2?
Thanks for answer!
Jarek!
On Fri, 2007-12-21 at 10:54 -0600, Jayesh Krishna wrote:
> Hi,
> The only process manager available on MPICH2 for windows is SMPD. You
> should configure MPICH2 on the linux machine to use SMPD (./configure ...
> --with-pm=smpd ...).
> Also note that we currently don't have support for heterogeneous machines
> in MPICH2 (You cannot run your program across 32-bit and 64-bit machines).
> Heterogeneous support is slated for a future release (sometime next year).
>
> (PS: The default process manager on the linux side is MPD.)
>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jaroslaw Bulat
> Sent: Friday, December 21, 2007 10:11 AM
> To: mpich2
> Subject: [MPICH] communication between windows and linux
>
> Hi!
>
> I cannot establish communication between linux and windows. On both side I
> have MPICH2-1.0.6p1 with smpd. In order to test communication, I wrote
> simple ping-pong like program (two programs, c-sources in attachment).
> When they both working either on linux or windows side, everything is
> correct. When one program is executed on linux and second on windows, ring
> crushes during first message passing. Bellow some debug outputs:
>
> Both programs (ssimple_sender and ssimple_receiver) are working in a windows
> box (149.156.197.238), however, they are started from linux box (mpiexec
> command are executed in a linux environment - different computer, different
> IP). Debug output are mixed from both programs and appears on linux box. In
> this case everything is correct.
>
> mpiexec -n 1 -host 149.156.197.238 -path "C:\Documents and Settings
> \kwant\Desktop\workspace\ssimple_sender\Debug"
> ssimple_sender/Debug/ssimple_sender : -n 1 -host 149.156.197.238 -path
> "C:\Documents and Settings\kwant\Desktop\workspace\ssimple_receiver
> \Debug" ssimple_receiver
> REC: init_1
> SEND: init_1
> REC: init_2
> REC: init_3 numprocs (2): 2
> REC: init_4 myid (1): 1
> REC: loop 1
> SEND: init_2
> SEND: init_3 numprocs (2): 2
> SEND: init_4 myid (0): 0
> SEND: loop 1
> REC: loop 2_Irecv
> REC: loop 3_Wait
> SEND: loop 2_Irecv
> SEND: loop 3_Send, data: 0
> SEND: loop 4_Sent
> SEND: loop 5_Wait
> SEND: loop 6_Rec, data: 0
> ------ SEND -------- SEND -------- SEND ---------
> SEND: loop 1
> REC: loop 4_Send, data: 0
> REC: loop 4_Sent
> REC: loop 1
> REC: loop 2_Irecv
> REC: loop 3_Wait
> SEND: loop 2_Irecv
> SEND: loop 3_Send, data: 1
> SEND: loop 4_Sent
> SEND: loop 5_Wait
> SEND: loop 6_Rec, data: 1
> ------ SEND -------- SEND -------- SEND --------- etc.....
>
>
> simple_sender is started locally (linux - zm203.zmet.agh.edu.pl) and
> simple_receiver remotely (windows - 149.156.197.238). Ring is started from
> linux, and they die before first MPI_Send(...) is finished. Below debug
> output and exit code:
>
> mpiexec -n 1 -localonly ssimple_sender/Debug/ssimple_sender : -n 1 -host
> 149.156.197.238 -path "C:\Documents and Settings\kwant\Desktop\workspace
> \ssimple_receiver\Debug" -channel sock ssimple_receiver
> SEND: init_1
> REC: init_1
> SEND: init_2
> SEND: init_3 numprocs (2): 2
> SEND: init_4 myid (0): 0
> SEND: loop 1
> REC: init_2
> REC: init_3 numprocs (2): 2
> REC: init_4 myid (1): 1
> REC: loop 1
> REC: loop 2_Irecv
> REC: loop 3_Wait
> SEND: loop 2_Irecv
> SEND: loop 3_Send, data: 0
>
> job aborted:
> rank: node: exit code[: error message]
> 0: zm203.zmet.agh.edu.pl: -2
> 1: 149.156.197.238: 1: Fatal error in MPI_Wait: Other MPI error, error
> stack:
> MPI_Wait(156)................................:
> MPI_Wait(request=003E3DB0, status003E2A08) failed
> MPIDI_CH3i_Progress_wait(215)................: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(640)...:
> MPIDI_CH3_Sockconn_handle_connopen_event(887): unable to find the process
> group structure with id <>
>
>
> Both computers are x86_64 boxes, linux is native 64bits, windows is in
> 32bits version. I'm using MinGW on windows (g++ -v: gcc version 3.4.5 (mingw
> special)), and gcc version 4.1.2 (Gentoo 4.1.2) on Linux.
>
> Could anyone help me with this broken communication?
>
>
> Jarek!
>
More information about the mpich-discuss
mailing list