[mpich-discuss] MPICH2 with MOSIX

Alex Margolin alex.margolin at mail.huji.ac.il
Sat Nov 26 11:21:24 CST 2011


Hi,

I'm trying to make MPICH2 run with a recent version of MOSIX. For now 
i'm running it locally (simple.cpp sources attached).

I got the following results when running native:

alex at singularity:~/huji/benchmarks/simple$ ~/huji/mpich/bin/mpiexec -n 1 
./simple
Started as #0 out of 1
alex at singularity:~/huji/benchmarks/simple$ ~/huji/mpich/bin/mpiexec -n 2 
./simple
Started as #1 out of 2
Started as #0 out of 2
#0 Got 0 from 0
alex at singularity:~/huji/benchmarks/simple$ ~/huji/mpich/bin/mpiexec -n 3 
./simple
Started as #0 out of 3
Started as #2 out of 3
Started as #1 out of 3
#0 Got 0 from 0
#1 Got 0 from 0
#1 Got 1 from 1

However, I still have problems when I run 3 or more processes of MPI 
over MOSIX.
The first two (-n 1 or -n 2) commands work fine, but when I run three 
processes it ends with an error:

alex at singularity:~/huji/benchmarks/simple$ mosrun 
~/huji/mpich/bin/mpiexec -n 1 ./simple
Started as #0 out of 1
alex at singularity:~/huji/benchmarks/simple$ mosrun 
~/huji/mpich/bin/mpiexec -n 2 ./simple
Started as #Started as #0 out of 2
1 out of 2
#0 Got 0 from 0
alex at singularity:~/huji/benchmarks/simple$ mosrun 
~/huji/mpich/bin/mpiexec -n 3 ./simple
Started as #Started as #1 out of 3
0 out of 3
#0 Got 0 from 0
Started as #2 out of 3
#1 Got 0 from Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device 
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid 
argument
0Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device 
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid 
argument

#1 Got 1 from 1
alex at singularity:~/huji/benchmarks/simple$

I looked at the relevant code (MPID_nem_tcp_connpoll() @ socksm.c:1801) 
and printed out the argument for the poll() syscall, and found out that
one of the file descriptors is -1... and so the EINVAL is well deserved, 
but still - If the run finished, couldn't it just close successfully?
I read somewhere on the web that earlier versions of MOSIX ran with 
previous versions of MPICH2. I tried version 1.2 but to no avail (same 
error).

Thanks for any help you can give me,
Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple.cpp
Type: text/x-c++src
Size: 608 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111126/fa6140bd/attachment.cpp>


More information about the mpich-discuss mailing list