[mpich-discuss] MPICH2 with MOSIX
Alex Margolin
alex.margolin at mail.huji.ac.il
Sat Nov 26 11:21:24 CST 2011
Hi,
I'm trying to make MPICH2 run with a recent version of MOSIX. For now
i'm running it locally (simple.cpp sources attached).
I got the following results when running native:
alex at singularity:~/huji/benchmarks/simple$ ~/huji/mpich/bin/mpiexec -n 1
./simple
Started as #0 out of 1
alex at singularity:~/huji/benchmarks/simple$ ~/huji/mpich/bin/mpiexec -n 2
./simple
Started as #1 out of 2
Started as #0 out of 2
#0 Got 0 from 0
alex at singularity:~/huji/benchmarks/simple$ ~/huji/mpich/bin/mpiexec -n 3
./simple
Started as #0 out of 3
Started as #2 out of 3
Started as #1 out of 3
#0 Got 0 from 0
#1 Got 0 from 0
#1 Got 1 from 1
However, I still have problems when I run 3 or more processes of MPI
over MOSIX.
The first two (-n 1 or -n 2) commands work fine, but when I run three
processes it ends with an error:
alex at singularity:~/huji/benchmarks/simple$ mosrun
~/huji/mpich/bin/mpiexec -n 1 ./simple
Started as #0 out of 1
alex at singularity:~/huji/benchmarks/simple$ mosrun
~/huji/mpich/bin/mpiexec -n 2 ./simple
Started as #Started as #0 out of 2
1 out of 2
#0 Got 0 from 0
alex at singularity:~/huji/benchmarks/simple$ mosrun
~/huji/mpich/bin/mpiexec -n 3 ./simple
Started as #Started as #1 out of 3
0 out of 3
#0 Got 0 from 0
Started as #2 out of 3
#1 Got 0 from Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid
argument
0Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).................: MPI_Finalize failed
MPI_Finalize(209).................:
MPID_Finalize(117)................:
MPIDI_CH3U_VC_WaitForClose(383)...: an error occurred while the device
was waiting for all open connections to close
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1801).......: poll of socket fds failed - Invalid
argument
#1 Got 1 from 1
alex at singularity:~/huji/benchmarks/simple$
I looked at the relevant code (MPID_nem_tcp_connpoll() @ socksm.c:1801)
and printed out the argument for the poll() syscall, and found out that
one of the file descriptors is -1... and so the EINVAL is well deserved,
but still - If the run finished, couldn't it just close successfully?
I read somewhere on the web that earlier versions of MOSIX ran with
previous versions of MPICH2. I tried version 1.2 but to no avail (same
error).
Thanks for any help you can give me,
Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple.cpp
Type: text/x-c++src
Size: 608 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111126/fa6140bd/attachment.cpp>
More information about the mpich-discuss
mailing list